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Preface 


A BOOK ABOUT SO CONTROVERSIAL A SUBJECT AS THE FOUNDATIONS 

of statistics may have some value in the classroom, as I hope this one 
will, but it cannot be a textbook, or manual of instruction, stating the 
accepted facts about its subject, for there scarcely are any. Openly, or 
coyly screened behind the polite conventions of what we call a disinter- 
ested approach, it must, even more than other books, be an pairing of 
its author’s current opinions 

One who so airs his opftiions has serious misgivings that (as may be 
judged from other prefaces) he often tries to communicate along with 
his book First, he longs to know, foi reasons that are not altogether 
noble, whethei he is leally making a valuable contribution His own 
conceit, the encouiagement of friends, and the confidence of his pub- 
lisher have given him hope, but he knows that the hopes of others m 
his position have seldom been fully realized 

Again, what he has written is far from perfect, even to his biased 
eye He has stopped revising and called the book finished, because 
one must soonei or latei 

Finally, he fears that he himself, and still more such public as he 
has, will foiget that the book is tentative, that an author’s most recent 
word need not be his last word 

The application of statistics interests some workers in almost every 
field of empirical investigation — ^not only in science, but also in com- 
merce and mdustiy. Moreovei, the foundations of statistics are con- 
nected conceptually with many disciplines outside of statistics itself, 
paiticularly mathematics, philosophy, economics, and psychology — a 
situation that, incidentally, must augment the natural misgivings of 
an author in this field about his own competence Those who lead in 
this book may, therefore, be diveise in background and interests. With 
this consideration in mind, I have endeavoied to keep the book as free 
from technical prerequisites as its subject matter and its restriction to 
a reasonable size permit 

Technical knowledge of statistics is nowheie assumed, but the reader 
who has some general knowledge of statistics will be much better pre- 
paied to understand and appiaise this book The books Statistics^ by 
L H. C Tippett, and On the Principles of Statistical Inference by 
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A. Wald, listed in the Bibliography at the end of Appendix 3, are short 
authoritative introductions to statistics, eithei of which would provide 
some statistical background for this book The books of Tippett and 
Wald are so different in tone and emphasis that it would by no means 
be wasteful to read them both, in that order 

Any but the most casual reader should have some formal prepaiation 
m the theoiy ot matliematical probability. Those acquainted witJi 
model aie^ly advanced iheoietical vslatistics will automatically have this 
preparation, others may acquire it, for example, by leading Theory of 
Probability j by M, Ih Munro&, or selected pait-s ot An Introduction to 
Probability Theory and Its Applications ^ by W Feller, according to 
their taste In Feller's book, a thorough leading of the Introduction 
and Chapter 1, and a casual leading of Chapters 5, 7, and 8 would be 
sufficient 

The explicit mathematical prerequisites ail^ not gieat, a year of cal- 
culus would in principle be more than enough Rut, m practice, read- 
ers without some training in toirnal logic or one of i,he abstract blanches 
of mathematics usuahy taught only after (?al<*ulus wall, 1 lear, find some 
of the long ihough elementaiy mathematKial (l(Klii(‘tu)ns ({uite loi bid- 
ding For the sake of such leaders, I therefoie (a,lve tli(‘ lib(‘rty ol giv- 
ing some pedagogK'al advi(‘e here and elsewlieie tliat inathematK'aily 
mote mature readcus will find suiierfiuoiis and possibly irritating In 
the first- plac(‘, it cannot be too strongly emphasized that- a long rnatlie- 
matic-al argument can be fully undeistood on fiist leading only wdien it. 
IS very elementaiy indeed, ndative to the reader's mathmnatieal knowl- 
edge If one wauits only the gist of it., h(‘ mav Jea.(l such mat.orial once 
only; but otherwise ho must expect to read it at least once again Sen- 
ous reading of mathematics is liest done sitting bolt upright on a hard 
(ihair at a desk. Pimcil and paper are nearly indispensable, for there 
are always figures to be sketched and sU^ps m the argument to bo veri- 
fied by calculation In this book, as in many mathematical books, 
when exercises are indicated, it. is absolutely essential that they be 
lead and nearly essential that they bo worked, because they constitute 
part ol the exposit-ion, the exeicisc form being adopted where it seems 
to the author best for conveying the particulai mfoimation at hand. 

To some mathematicians, and even more to logicians, I must say a 
word of apology for what they may consider lapses of ligor, such as 
using the same symbol with more than one meaning and failing to dis- 
tinguish uniformly between the use and the mention of a symbol; but 
they will understand that these lapses are sacrifices to what I take to 
be general intelligibility and will have, T hope, no K^al difficmlty in le- 
pairing them. 
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Few will wish to read the whole book, therefoie introductions to the 
chapters and sections have been so written as not only to provide orien- 
tation but also to facilitate skipping In particular, sale detours are 
indicated around mathematically advanced topics and other digressions 
A few woids in explanation of the conventions, such as those by which 
internal and external references aie made in this book, may be useful. 

The abbreviation § 3 4 means Section 4 of Chapter 3 , within Chapter 
3 itself, this would be abbreviated still further to § 4 The abbi eolation 
(3 4 1) means the first numbeied and displayed equation or other ex- 
pression in § 3.4, within Chapter 3, tlus would be abbreviated still 
further to (4 1) and within § 3 4 simply to (1) Theorems, lemmas, 
exercises, corollaries, figures, and tables are named by a similar system, 
e g , Theorem 3 4 1, Theorem 4 1, Theorem 1. Incidentally, the proofs 
of theorems are terminated with the special punctuation mark ♦, a 
devic# borrowed from Hajmos’ Measure Theory 
Seven postulates, PI, P2, etc., are introduced ovei the course of 
several chapters For ready leference these are, with some explanatory 
material, leproduced on the end papers 
Entries in the Bibliogiaphy at the end of Appendix 3 aie designated 
by a self-explanatoiy notation in square brackets. For example, the 
works of Tippett, Wald, Munioe, Feller, and Halmos, already referred 
to, are [T2], [Wl], [M6], [FI], and [H2], lespectively 
I often allude to a set of key references to a given topic This means 
a set of external refeiences intended to lead the reader that wishes to 
pursue that particular topic to the fullest and most recent bibliogiaphies, 
it has nothing to do with the merit or importance of the works refen ed to 
Technical teims (except foi non-verbal symbols) that are defined in 
this book aie printed m bold face or italics (depending on the impor- 
tance of the term foi this book or for established usage) m the context 
where the term is defined These special fonts are occasionally used 
for other pui poses as well Teims aie sometimes used informally — 
even in unofficial definitions — before being officially defined Even the 
official definitions are sometimes of necessity very loose, corresponding 
to the well-known principle that, in a formal theory, some terms must 
in strict logic be left undefined 

L J Savage 

University of Chicago 
April, 1954 
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CHAPTER 1 


Introduction 

1 The role of foiuidations 

It IS often argued academically that no science can be more secure 
than its foundations, and that, if there is controversy about 4he foun- 
datnms, there must be even greater controversy about the higher parts 
of the science As a matter of fact, the foundations are the most con- 
troversial parts of many, if not all, sciences Physics and pure mathe- 
matics are excellent examples of this phenomenon As for statistics, 
the foundations include, on any interpretation of which I have ever 
heard, the foundations of probability, as controversial a subject as one 
could name As in other sciences, controversies over the foundations 
of statistics leflect themselves to some extent m everyday practice, but 
not nearly so catastrophically as one might imagine I believe that 
here, as elsewhere, catastrophe is avoided, primarily because in prac- 
tical situations common" sense generally saves all but the most pedantic 
of us from flagiaiit enor It is haid to judge, however, to what extent 
the relative calm of modern statistics is due to its domination by a 
vigorous school relatively well agreed within itself about the foundations. 

Although study of the foundations of a science does not have the 
role that would be assigned to it by naive first-thmgs-firstism, it has a 
certain continuing importance as the science develops, influencing, and 
being influenced by, the moie immediately practical parts of the science 

2 Historical background 

The concept and pioblem of inductive inference have been promi- 
nent m philosophy at least since Aristotle Mathematical work on some 
aspects of the problem of inference dates back at least to the early 
eighteenth century Leibniz is said to be the first to publish a sugges- 
tion in that direction, but Jacob Bernoulli’s posthumous Ar5 Conjee- 
tandi (1713) [B12] seems to be the first concerted effort, f This mathe- 

t Valuable information on this and other topics of the early philosophic history of 
probability is attractively presented m Keynes' treatise [K4], especially m Chapters 
VII, XXIII, and the bibliography. 
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matical work has always revolved around the concept of probability; 
but, though there was active interest m probability for nearly a cen- 
tury be:^re the publication of Conjectandi, earlier activity seems 
not to have been concerned with inductive infeience 

In the present century there has been and continues to be extra- 
ordinary interest m mathematical treatment of problems of inductive 
inference For reasons I cannot and need not analyze here, this ac- 
tivity^has been strikingly concentrated m the English-speaking world. 
It is known under several names, most of which stress some aspect of 
the subject that seemed of overwhelming importance at the moment 
when the name was coined ^^Mathematical statistics/^ one of its 
earliest names, is still the most popular In this name, “mathematicar^ 
seems to-be intended to connote rational, theoretical, or perhaps mathe- 
matically advanced, to distinguish the subject from those problems of 
gathering and condensing numerical data th^t can be considered apart 
from the problem of inductive mfeience, the mathematical tieatment 
of which is geneially relatively tiivial The name ^^statistical inference^ ^ 
recognizes that the subject is concerned with inductive infcionce The 
name ^^statistical decision^^ reflects the idea that inductive inference is 
not always, if ever, concerned with what to believe in the face ot in- 
conclusive evidence, but that at least sometimes it is coiu^erned with 
what action to decide upon under such circumstances Within this 
book, theic will be no harm in adopting the shoitest possible name, 
“statistics ’’ 

It IS unanimously agreed that statistics depends somehow on proba- 
bility But, as to what probability is and how it is conn(H*ted with 
statistics, there has seldom been such complete disagreianent a<ud break- 
down of communication since the Tower of Babel. Thoie must be 
dozens of diflerent interpietations of probability defended by living 
authorities, and some authorities hold that sever al diderent intoiprcl.a- 
tions may be Uvseful, that is, that the concept of probability may have 
different meaningful senses m dilfeient contexi.s Doubtless, much of 
the disagi cement is merely terminological and would disappeai under 
sufficiently shaip analysis Some believe that it would all disappear, 
or even that they have themselves alieady made the necessary 
analysis. 

Considering the confusion about the foundations of statistics, it is 
surprising, and ceitainly gratifying, to find that almost eveiyone is 
agreed on what the purely mathematical properties of piobability are. 
Virtually all controversy therefore centers on questions of interpreting 
the generally accepted axiomatic concept of piobability, that is, of de- 
termining the extramathematical properties of probability. 
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The widely accepted axiomatic concept referred to is commonly as- 
cribed to Kolmogoroff [K7] and goes by his name It should be men- 
tioned that there is some dissension from it on the pait of a simll group 
led by von Mises [V2] There aie also a few minor technical variations 
on the Kolmogoroff system that are sometimes of interest , they will be 
discussed in § 3 4 

I would distinguish three mam classes of views on the interpretation 
of probability, for the purposes of this book, calling them objectivistic, 
personalistic, and necessary Condensed descriptions of these three 
classes of views seem called for here If some readers find these descrip- 
tions condensed to the point of umntelhgibility, let them be assured 
that fuller ones will gradually be developed as the book proceeds 
Objectivistic views hold that some repetitive events, such^s tosses 
of a genny, prove to be in reasonably close agreement with the mathe- 
matical concept of indepmidently repeated random events, all with the 
same piobability According to such views, evidence for the quality 
of agreement between the behavior of the lepetitive event and the 
mathematical concept, and for the magnitude of the probability that 
applies (m case any does), is to be obtained by observation of some 
repetitions of the event, and from no other source whatsoever 
Personalistic views hold that probability measures the confidence 
that a particular individual has in the truth of a particular proposition, 
for example, the proposition that it will ram tomorrow These views 
postulate that the individual concerned is in some ways ' ‘reasonable,'^ 
but they do not deny the possibility that two reasonable individuals 
faced with the same evidence may have different degrees of confidence 
m the truth of the same proposition 
Necessary views hold that probability measures the extent to which 
one set of propositions, out of logical necessity and apart from human 
opinion, confirms the truth of another They are generally regarded 
by their holders as extensions of logic, which tells when one set of prop- 
ositions necessitates the truth of another 
After what has been said about the intensity and complexity of the 
controversy over the probability concept, you must realize that the 
short taxonomy above is bound to infuriate any expert on the founda- 
tions of piobability, but I trust it may do the less learned more good 
than harm 

The great burst of statistical research m the English-speaking world 
m the present century has revolved around objectivistic views on the 
interpretation of probability As will shortly be explained, any purely 
objectivistic view entails a severe difficulty for statistics. This diffi- 
culty is recognized by members of the British-American School, if I 



4 


INTRODUCTION * 


[13 


may use that name without its being taken too literally oi at all na- 
tionalistically, and is regarded by them as a great, though not msur- 
mountat^e, obstacle, indeed, some of them see it as the central problem 
of statistics 

The difficulty m the objectivistic position is this In any objccti- 
vistic view, piobabilities can apply fruitfully only to repetitive events, 
that is, to certain processes; and (depending on the view m question) 
it IS eithei meaningless to talk about tlie probability that a given propo- 
sition IS tiiie, 01 this piobability can be only 1 or 0, according as the 
pioposition IS m fact tine oi false Under neither interpretation can 
probability seive as a measure of the trust to be put in the proposition 
Thus the existence of evidence foi a proposition can never, on an ob- 
JectlVlstl^i view, be expressed by saying that the proposition is true with 
a certain probability Again, if one must choose among several courses 
of action in the light of expeiimental evidencii, it is not meamngfiil, m 
terms of objective probability, to compute winch of these actions is 
most piomising, that is, which has the highest expected income Hold- 
ens of objectivistic views have, i.hereforo, no recourse but to argue tha(> 
it is not reasonable to assign probabihi-ios to the truth of propositions 
or to calculate which of several actions is the most piomismg, and that» 
the need expressed by the att.empt to set. up such concepi-s must bo 
met m other ways, if at. all 

The British- American School has liad gieat success in several le- 
spects The number of its adherents has lapidly increased. It has con- 
tributed many procediues of strong intuitive appeal and (one feels) of 
lasting woith These have found widespread application in many 
sciences, in industry, and in commeice The success of the school may 
pragmatically be taken as evidence for the correctness of the general 
view on which it is based Indeed, anyone who overthrows that view 
must either discredit the procediues to which it has led, oi show, as 
I hope to show in this book, that they arc on the whole consistent with 
the alternative proposed 

Some, I among them, hold that the grounds for adopting an objoc- 
tivistic view are not overwhelmingly strong, that there are serious log- 
ical objections to any such view; and, most important of all, that the 
diflSculty a strictly objectivistic view meets in statistics reflects leal 
inadequacy. 

3 General outline of this book 

This book presents a theory of the foundations of statistics which is 
based on a personahstic view of probability derived mainly from the 
work of Bruno de Fmetti, as expressed for example m [D2]. The theory 
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is presented in a tentative spirit, for I realize that the serious blemishes 
in it apparent to me are not the only ones that will be discovered by 
critical readers A theory of the foundations of statistics tha^^ appears 
contrary to the teaching of the most productive statisticians will prop- 
erly be regarded with extraordinary caution Other views on proba- 
bility will, of course, be discussed in this book, partly for their own in- 
terest and partly to explain the relationship between the personalistic 
view on which this book is based and other views 

The book is organized into seventeen chapters, of which the present 
introduction is the first. Chapters 2-7 are, so to speak, concerned with 
the foundations at a relatively deep level They develop, explain, and 
defend a certam abstract theory of the behavior of a highly idealized 
person faced with uncertainty. That theory is shown to ha'^e as im- 
plications a theory of personal probabihty, corresponding to the per- 
sonaf?stic view of probability basic to this book, and also a theory of 
utility due, in its modern form, to von Neumann and Morgenstern 
[V4] 

There is a transition, occurring in Chapter 8 and maintained through- 
out the rest of the book, to a shallower level of the foundations of sta- 
tistics, I might say from pre-statistics to statistics proper In those 
later chapters, it is recognized that the theory developed in the earlier 
ones IS too highly idealized for immediate application Some compro- 
mises have to be made, and the appropiiate ones are sought in an anal- 
ysis ot some of the inventions and ideas of the British-American School 
It will, I hope, be demonstrated thereby that the supeificially incom- 
patible systems of ideas associated on the one hand with a personalistic 
view ot probability and on the other with the objectivistically inspired 
developments of the Bntish-American School do in fact lend each other 
mutual suppoit and clarification 



CHAPTER 2 

Preliminary Considerations 

on Decision in . 

the Face of Uncertainty 

1 Intro&uction. 

Decisions made in the face of uncertainty •pervade the life of^very 
individual and organization Even animals might be said continually 
to make such decisions, and the psychological mechanisms by which 
men decide may have much in common with those by which animals 
do so But formal reasoning presumably plays no role in the decisions 
of animals, little in those of children, and less than might be wished in 
those of men It may be said to be the purpose of this book, and in- 
deed of statistics geneially, to discuss the implications of leasonmg foi 
the making of decisions 

Reasoning is commonly associated with logic, but it is obvious, as 
many have pointed out, that the implications of what is ordinarily 
called logic are meager indeed when uncertainty is to be faced It has 
therefore often been asked whether logic cannot be extended, by prin- 
ciples as acceptable as those of logic itself, to bear more fully on un- 
certainty. An attempt to extend logic in this way will be begun m 
this chapter, differing in two important respects from most, but not 
all, other attempts 

First, since logic is concerned with implications among propositions, 
many have thought it natural to extend logic by setting up cnteria for 
the extent to which one pioposition tends to imply, oi piovide evidence 
for, anothei It seems to me obvious, however, that what is ultimately 
wanted is cnteria for deciding among possible courses of action, and, 
therefore, generalization of the relation of implication seems at best a 
roundabout method of attack It must be admitted that logic itself 
does lead to some cnteria for decision, because what is implied by a 
proposition known to be true is m turn true and sometimes lelevant to 
making a decision Should some notion of partial implication be de- 
monstrably even better articulated with decision than is implication it- 
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self, that would be excellent, but how is such a notion to be sought ex- 
cept by explicitly studying decision? Ramsey^s discussion in [Rl] of 
the point at issue here is especially forceful ^ 

Second, it is appealing to suppose that, if two individuals in the same 
situation, having the same tastes and supplied with the same informa- 
tion, act reasonably, they will act in the same way Such agreement, 
belief in which amounts to a necessary (as opposed to a personalistic) 
view of probability, is certainly worth looking foi Personally,"'! be- 
lieve that it does not correspond even roughly with reality, but, hav- 
ing at the moment no stiong argument behind my pessimism on this 
point, I do not insist on it But I do insist that, until the contrary be 
demonstrated, we must be prepared to find reasoning inadequate to 
bring about complete agreement In particular, the extension^ of logic 
to be adduced in this book will not bring about complete agreement, 
and •Aether enough addikonal piinciples to do so, or indeed any addi- 
tional principles of much consequence, can be adduced, I do not know 
It may be, and indeed I believe, that there is an element in decision 
apart from taste, about which, like taste itself, there is no disputing 

The next four sections of this chapter build up a formal model, or 
scheme, of the situation in which a person is faced with uncertainty, 
the final two, in teims of this model, motivate and state some of the 
few principles that seem to me entitled to be taken as postulates for 
lational decision 

2 The person 

I am about to build up a highly idealized theory of the behavior of a 
'h’ational” peison with lespect to decisions In doing so I will, of course, 
have to ask you to agree with me that such and such maxims of behavior 
are ^hational In so far as ‘hationaF^ means logical, there is no live 
question, and, it I ask your leave there at all, it is only as a matter of 
form t But our person is going to have to make up his mind in situa- 
tions in which ciiteria beyond the ordinary ones of logic will be neces- 
sary vSo, when certain maxims are presented for your consideration, 
you must ask yourself whether you try to behave in accordance with 
them, or, to put it differently, how you would react if you noticed your- 
self violating them. 

t The assumption that a person’s behavior is logical is, of course, far from vacuous 
In particular, such a person cannot be uncertain about decidable mathematical prop- 
ositions This suggests, at least to me, that the tempting program sketched by Polya 
[P6] of establishing a theory of the probability of mathematical conjectures cannot 
be fully buccesstul in that it cannot lead to a truly formal theoiy, but de Fmetti 
[D5] seems more optimistic about the program 
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It IS brought out in economic theory that organizations sometimes 
behave like individual people, so that a theory originally intended to 
apply to people may also apply to (or may even apply bettei to) such 
units as families, corpoiations, or nations In view of this possibility, 
economic theorists are sometimes reluctant to use the word “person,” 
or even “individual,” for the behaving units to which they lefer, but 
for our puipose “person” threatens no confusion, though the possi- 
bility of using it in an extended sense may well be borne in mmd 

3 The world, and states of the world 

A formal description, or model, of what the person is uncertain about 
will be needed To motivate this formal description, let me begin in- 
formally^ by consideimg a list of examples The person might be un- 
certain about 

1 Whether a particular egg is rotten • ^ 

2 Which, if any, m a paiticular dozen eggs are rotten 

3 The temperature at noon in Chicago yesterday 

4 Whai- the temperature was and will ho in the placic now covered 
by Chicago each noon from Jamiaiy I, 1 a n , to Januaiy 1, 'lOOO a n 

5 TIic infinite sequence of heads and ta,ils that will result from re- 
peated tosses of a pai’ticiulai (everlasting) com 

6 The (‘omplete decimal expansion of tt. 

7 The exact and entire past, present, and futuie histoiy of the uni- 
verse, understood in any sense, however wide 

These examples have a few features m common, though, il theie are 
more than a few, it is a discredit to my imagination Thus, m each 
there is some object about which the person is uncertain, an egg, a 
dozen eggs, a temperature, a sequence of temperatuies, etc J^ach ob- 
ject admits a certain class of desenptions that might thmkably apply 
to it To illustrate, the egg of Example 1 might be lotten or not, and 
the teims of the example are meant to exclude any other descnptioii 
from consideration, though, of course, a real egg has many other lea- 
tures. Again, since any subset of the dozen eggs (including the extreme 
cases of all and none at all) might be rotten, there are 2^^ descriptions 
associated with Example 2 For Example 3 and each subsequent one, 
there are an infinite number of descriptions, though the airay of de- 
scriptions IS more complicated in some than m others, reaching the ulti- 
mate of complexity in Example 7 Example 6 is a little anomalous 
in that anything the person does not know about the description of tt 
he could know m principle by thinking sufficiently hard about it, that 
is, by logic alone. This point, banal to some leadeis, needs explanation 
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for others If, for example, tt is understood to be the area of a circle of 
unit radius, it follows by logic alone that tt is not greater than the area 
of a square ciicumscribing the unit circle, that is, tt < 4 By an elabo- 
ration of this method tt can be computed to any degree of accuracy, 
and by other purely logical methods many other facts about t can be 
established, such as the fact that t is not a rational number 

In connection with the concepts suggested by the preceding ^para- 
graph, the following nomenclature is proposed as brief, suggestive, and 
in reasonable harmony with the usages of statistics and ordinary dis- 
course 


Teim 

the world 

a^tate (of the woild) * 
the true state (of the woild) 


Definition 

the object about which the person is 
concerned 

a description of the world, leaving no 
relevant aspect undescnbed 
the state that does in fact obtain, i e , 
the true description of the woild 


In application of the theory, the question will arise as to w^hich world 
to use in a given context Thus, if the peison is mteiested in the only 
broAvn egg in a dozen, should that egg or the whole dozen be taken as 
the woikh'^ It will be seen as the theoiy is developed that m pimciple 
no haim is doin' by taking the larger of two woilds as a model of the 
situat-iou One is therefore tempted to adopt, once an^l for all, one 
woild sufficiently largo, say Example 7 The most serious objection to 
this IS that Example 7 is vague, and some mathematical and philosophi- 
cal! experience suggests that the vagueness cannot be removed without 
ruining the univoisahty of the example It may also be added that the 
use of modest little worlds, tailoied to paiticulai contexts, is often a 
simplification, the advantage of which is justified by a considerable 
body of mathematical expenence with lelated ideas 

The sense m which the world of a dozen eggs is larger than the world 
of the one brown egg m the dozen is in some respects obvious It may 
be well, however, to emphasize that a state of the smaller woild coire- 
sponds not to one state of the larger, but to a set of states Thus, 
'The brown egg is rotten^' desenbes the smaller world completely, and 
therefore is a state of it, but the same statement leaves much about the 
larger world unsaid and corresponds to a set of 2^^ states of it In the 
sense under discussion a smaller world is derived from a largei by neg- 
lecting some distinctions between states, not by ignoring some states 
outught The latter sort of contraction may be useful in case certain 
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states are regarded by the person as virtually impossible so that they 
can be ignored 

4 Events 

An event is a set of states For example, m connection with the 
world of Example 2, the pci son might well be concerned wiiJi the event 
that exactly one egg in the dozen is lotten (an event having 12 states 
as elements), or, a little less academically, that at least one ol the eggs 
IS rotten (an event having 2^^ — 1 states as elements, i e , all the states 
in the world but one) In connection with the world ot Example 3, 
the person might be concerned with the event, having an infinite num- 
ber of states, that the temperatuie at noon in Chicago yesterday was 
below freezing To give a final illustration, of a more mathematical 
flavor, consider in connection with Example 5 the event that the latio 
of the number of heads to tails approaches 3*as the sequence pi ogresses 
to infinity 

In connection with any given world, theie aie two events that aic 
of the utmost logical impori,ancc, though m ordinary discourse it may 
seem banal even to mention their existence These arc the universal 
and the vacuous events The universal event, heie to bo symbolized 
by S, is the event having every state of the woild as element In so 
far as ^hvorkT^ has a ical technical meaning, N is the woild The vacu- 
ous event, which can here be safely enough syinliolized by tlie 0 ol 
arithmetic, is the event having no states as elements To illusi-iate, m 
Example 1 the event that the egg is rotten oi good is the univeisal 
event, and that it is both rotten and good is the vacuous event 

It IS important to be able to express the idea that a given ('vemt eon- 
tarns the tiuo state among its elements Englisli usage seems to offer 
no alternative to the rather stuffy expression, ‘^the event obtains 

The theory under development makes no fonnal inference to time 
In particular, the concept of event as here formulated is timeless, though 
temporal ideas may be employed m the descnption of partuailar events 
Thus, it would not be said that Lincoln \s assassination is an event that 
occuned in 1805 and that the next return of Halley’s (iomet is one that 
will occui III 1985, but tliat Lincoln’s assassination m 1865 and the 
return of Halley’s comet m, but not before, 1985 are events that 
obtain 

Modern mathematical usage, especially that of a branch of mathe- 
matics called Boolean algebra, suggests the following table oi defini- 
tions m connection with the concepts of state and event Some of 
these are synonyms, others abbreviations, and still other s new terms 
compounded out of old 
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Though the notations introduced in Table 1 are very elementary 
and of great utility, they are not ordinarily taught except in connec- 
tion with logic or relatively advanced mathematics A set of exercises 
illustrating their use is therefore given below in the form of a numbered 
list of statements These statements are true whatever the sets A, 


Table 1 Mathematical nomenclature pertaining to state and events 


Teim 

{Basic terms) 

set 

A, B, C, 
s, s', s" 

S 

0 

(Relations) 

s eA 

Ad B (or BZ) A) 
A ^B 


(Constructs) 

the complement of A with 
icspect to S 
r^A 

the umon of the Ai’s 


A U J5 


tlie intersection of the A/s 


An B 


Definition 


event 

generic symbols for events 
generic symbols for states 
the universal event 
the vacuous event 


s is an element of A, i e , a state in A f 
A IS contained in J5, i e , every element 
of A IS an element of B 
A equals J5, i e , A is the same set as B, 
1 e , A and B have exactly the same 
elements 


those elements of S that are not in A 

the complement of A with lespect to S 
those elements of ^ that aie elements 
of at least one of the sets Ai, A 2 , etc 
the union of the AiS 
the union of A and B, le , those ele- 
ments of S that are elements of A or 
B (possibly of both) 
those elements of S that are elements 
of each of the sets Ai, A 2 , etc 
the intersection of the A^^s 
the intersection of A and B, le , those 
elements of S that are elements of 
both A and B 


t Typographical note The Person font of the Greek alphabet (a, /S, 7 , 5, €, ) 

IS the one almost always printed, at least m America, when mathematical constants 
and variables are denoted by Greek letters The symbol s used in this and some other 
publications to denote '‘element of” is, however, the epsilon of the Vertical font 
(a, 7 , S, e, ) Some publications use the special symbol c; and some use €, 

the Poison epsilon, presumably because of its resemblance to c The latter usage 
entails eithei using e for two different purposes or else changing fonts in mid alphabet 
(a, /3, 7 , 6 , e, • • ) when constants and variables are denoted by Greek letters 
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C may be Mathematicians would for the most part verify them by 
translating them into English and appealing to common sense, though 
in complicated cases explicit use might be made of Exercise 9 Dia- 
grams, called Venn diagrams, in which sets are symbolized by areas, 
as illustiatcd by Figure 1, arc often suggestive 




Figure 1 


It IS S. remarkable and useful fact that any universally valid state- 
ment about sets remains so if, throughout, U is interchanged wjth fl , 
0 with S, and c with 3 The dual m this sense of each exeicise should 
be studied along with the exercise itself For example, the dual ol 
Exercise 7 is Ad B, if and only if yl = A U 75 Note that the first 
parts of Exeiciscs 1 througli 0 aie dual i.o the second paids 

It may be remarked that, if ExcKases l-b aio taken as axioms and 
7 as a definition, Exciciscs 8-21 and also the duality principle follow 
foimally fiom them For example, 10 can be proved thus. By 7, if 
A f] B IS A, then A C i?, but, by 1, A fi A is A, thorefoie A C A, 
Again, 8 can be proved, using 6, 8, 2, 1, 3, and fi m that oidei, tluis 

(1) 0 n A = (A n --A) n a - (--a n a) n d 

- --A n (A n A) = -'A n A - d n -^d - o 

Such formal demonstiation is fun and helps develop mathemat/ical skill 
In the present exeicises the novice, however, sliould consider it as a 
possible supplement to, but not as a substitute for, demonstiation by 
interpretation 

If the exercises fail to tender the notations familiar, it would be best; 
to talk with someone to whom they are already familial or failing that, 
to read in any elementary book wheie the subject is tieated, foi ex- 
ample, Chapter II, “The Boole-Schroeder Algebra,” in the text of 
Lewis and Langfoid [L7] 

Exercises illustrating Boolean algebra 

1 AnA = A = AUA. 

2 (A n B) n c = A n (js n c), (a u b) u c - a u (b u c) 

(These facts often render parentheses superfluous ) 

3 AnB = BnA,AUB-BUA 
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4 A n (S U C) = (A n B) U (A n C); A U (B n C) = 

(A U B) n (A U C) 

5 B n A = A;0 U A = A. 

6 A n (~A) = 0, A U (~A) = S 

7 A C B, if and only if A = A fl B 

8 0 n A = 0 

9 A = B, if and only if A c B and B c A 

10 A C A 

11 (A n B) c A 

12 If A c B, then (A H 0) C (B 0 C), and (A U C) C (B U C) 

13 (A U B) c C, if and only if A C C and B cz C. 

14 0 C A C B 

15 A n (A U B) = A 

16=.»*(~(~A) = A. ^ 

17 '~(A U B) = (^A) n (~B) (De Morgan’s theorem) 

18 ~0 = B. 

19 A n (~A u B) = A n B 

20 A C B, if and only if (~B) c (~A) 

21 /t c: B, if and only if A fl (''-B) = 0 

22 ~(U*A,) = n ^ (^A^) (General De Morgan^s theorem) 

23. A U (fi.BO = n.(A UBO 

24 A n (fl.-BO = n.(A n BO- 

25. (U. AO U (Ui5,) = (A. U BO 

20 (H^AO U (n,B0 = n.,.(A.UB0. 

27 A c (fl 1 if and only li A cz for every t 

28 (n.-«.)cB, C(U ^ B^) for every j 

6 Consequences, acts, and decisions 

To say that a decision is to be made is to say that one of two or more 
acts IS to he cliosen, or decided on In deciding on an act, account 
must be taken of the possible states of the world, and also of the con- 
sequen(‘es implicit in each act for each possible state of the world A 
consequence is anything that may happen to the person 

Consider an example Your wife has just broken five good eggs into 
a bowl when you come in and volunteer to finish making the omelet 
A sixth egg, which foi some reason must either be used for the omelet 
01 wasted altogether, lies unbroken beside the bowl You must de- 
cide what to do with this unbroken egg Perhaps it is not too great an 
ovei simplification to say that you must decide among three acts only, 
namely, to break it into the bowl containing the other five, to break it 
into a saucei for inspection, or to throw it away without inspection. 
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Depending on the state of the egg, each of these three acts will have 
some consequence of concern to you, say that indicated by Table 1 

% 

Table 1 An example illustrating acts, states, and consequences 


^ Act 

State 

Good 

Rotten 

break into bowl 

six-egg omelet 

no omelet, and five good eggs 
destroyed 

break injo saucer 

six-egg omelet, and a saucer 
to wash 

five-egg omelet, and a saucer 
to wash 

throw away 

five-egg omelet, and one good 
egg destioyed 

^five-egg omelet ^ 


Even the little example concerning the omelet suggests how varied 
the things, or experiences, legarded as consequences, can be They 
might in general involve money, life, state of health, approval of friends, 
well-being of otheis, the will of God, or anything at all about which the 
person could possibly be concerned Consequences might appropriately 
be called states of the person, as opposed to states of the woild They 
might also be referred to, with some extension of the economic notion 
of income, as the possible incomes of the poison In any one problem, 
the set of consequences envisaged will be denoted by F, and the indi- 
vidual consequences will be denoted by /, h, etc In the omelet ex- 
ample, F consists of the six consequences tabulated in Table 1 . six-egg 
omelet, no omelet, and five good eggs destioyed; etc 
If two different acts had the same consequences in every state of the 
world, there would from the picsent point of view be no point in con- 
sidering them two different acts at all An act may theiefore be iden- 
tified with its possible consequences Oi, more formally, an act is a 
function attaching a consequence to each state of the woild The nota- 
tion f will be used to denote an act, that is, a function, attaching the 
consequence /(s) to the state s The notation f is logically a better 
name for a function than the more customary J{s) for exactly the same 
reason that the word ^fiogarithm’’ is a better teim for logaiithm than 
‘logarithm of x’^ would be The notational distinction involved here is 
often justifiably neglected m mathematical woik, but wc will have spe- 
cial need to obseive it, at least in connection with acts, as will soon be 
explained When several acts are to be discussed at once, they may be 
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denoted by different letters thus: f, g, h, by the use of primes thus: f, 
f, f", or by subsciipts thus fi, f^ The set of all acts available in a 
given situation will be denoted by F or a similar s3n3abol. In the ex- 
ample of the omelet, F has three acts as elements If, for example, f 
denotes the first of the three acts listed in Table 1, then f is defined 
thus 


( 1 ) 


/(good) = six-egg omelet, 

/(rotten) = no omelet, and five good eggs destroyed 


The argument might be raised that the formal description of decision 
that has thus been erected seems inadequate because a person may not 
know the consequences of the acts open to him in each state^of the 
world He might be so ignorant, for example, as not to be sure whether 
one rSft-en egg will spoil a* six-egg omelet But in that case nothing 
could be simpler than to admit that there aie foui states in the world 
coriespondmg to the two states of the egg and the two conceivable 
answers to the culmaiy question whether one bad egg will spoil a six- 
egg omelet It seems to me obvious that this solution works in the 
greatest geneiality, though a thoroughgoing analysis might not be triv- 
ial A readei interested in the technicalities of this point or that of 
the succeeding paragraph will find an extensive discussion of a similar 
pioblem m Chaptei II of [V4], wheie von Neumann and Morgenstein 
discuss the i eduction of a geneial game to its reduced form 

Again, the foimal desciiption might seem inadequate in that it does 
not provide explicitly for the possibility that one decision may lead to 
another Thus, if the omelet should be spoiled by bieaking a rotten 
egg into it, new questions might anse about what to substitute for 
breakfast and how to appease youi justifiably furious wife But, just 
as in the piecedmg paiagraph an apparent shoitcommg of the proposed 
mode of descnption was attributed to an incomplete analysis of the 
possible states, here I would say that the list of available acts envisaged 
m Table 1 is inadequate for the mteipietation that has just been put 
on the problem Wheie the single act “bieak into bowk^ now stands, 
there should be several, such as ^ 'break into bowl, and in case of dis- 
aster have toast,” "break into bowl, and in case of disaster take family 
to a neighboring restaurant for breakfast ” Appropriate consequences 
of these new acts can easily be imagined 

As has just been suggested, what m the ordinary way of thinking 
might be regarded as a chain of decisions, one leading to the other in 
time, IS in the formal description proposed here regarded as a single de- 
cision To put it a little differently, it is proposed that the choice of a 
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policy or plan be regaided as a single decision This point of view, 
though not always in so explicit a form, has played a prominent role 
in the. statistical advances of the present century For example, the 
great majority of experimentalists, even today, suppose that the func- 
tion of statistics and of statisticians is to decide what conclusions to 
diaw from data gathered in an experiment or other observational pio- 
gram But statisticians hold it to be lacking m foresight to gather data 
without a view to the method of analysis to be employed, that is, tiiey 
hold that the design and analysis of an experiment should be decided 
upon as an articulated whole 

The point of view under discussion may be symbolized by the prov- 
erb, ^‘Look before you leap,’" and the one to which it is opposed by the 
proverb, ^^You can cross that bridge when you come to it When two 
proverbs conflict in this way, it is pioverbially tme that there is some 
truth in both of them, but raiely, if ever, ^an then common truth be 
captured by a single pat proverb One must indeed look before he 
leaps, in so far as the looking is not umcasonably timc-consummg and 
otherwise expensive, but theie are mnumeiable bridges one cannot 
afford to cross, unless he happens to come to them 

Gamed to its logical extieme, the ‘^Look befoic you leap^^ piinciple 
demands that one envisage every conceivable policy for the government 
of his whole life (at least from now on) m its most minute details, in 
the light ol the vast numbei of unknown states of the world, and decide 
here and now on one policy This is uttcily ndiculous, not — as some 
might think — because theie might latei be cause lor i egret, if things 
did not tiun out as had been anticipated, but because the task implied 
in making such a decision is not even lemotely lesombled by liuman 
possibility It IS even utteily beyond our powci to plan a picnic or to 
play a game of chess m accordance witli the principle, even when the 
world of states and the set of available acts to be envisaged are aitdi- 
cially reduced to the nairowest reasonable limits 

Though the ^T^ook befoic you leap” pnnciple is pieposinrous li (ju- 
ried to exti ernes, I would none the less aigue that it is tlic piofier sub- 
ject of om fuither discussion, because to cross one’s liridges when one 
comes to them means to attack lelatively simple pioblems of decision 
by artificially confining attention to so small a world that the ^'Look 
before you leap” principle can be applied theie I am unable to formu- 
late criteria for selecting these small woilds and indeed believe that 
their selection may be a matter ot judgment and expenence about which 
it IS impossible to enunciate complete and sharply defined general prin- 
ciples, though something moie will be said in this (‘onnec'tion m § 5 5. 
On the other hand, it is an operation in which wc all iieccssaiily have 
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much experience, and one in which there is in practice considerable 
agreement 

In view of the '^Look before you leap’^ principle, acts and decisions, 
like events, are timeless The person decides ^^now'^ once for all, there 
IS nothing for him to wait for, because his one decision provides for all 
contingencies None the less, temporal modes of description, though 
translatable into atempoial ones, are often suggestive Thus, there 
will be occasion to analyze and make frequent use of the idea of (J^fer- 
ring a decision until an observation relevant to it has been made 

6 The simple ordermg of acts with respect to preference 

Of two acts f and g, it is possible that the person prefers f to g 
Loosely speaking, this means that, if he were required to decide l;;etween 
f and g, no other acts being available, he would decide on f 

Thl^piocedure for testing preference is not entirely adequate, if only 
because it fails to take account of, or even define, the possibility that 
the pel son may not really have any preference between f and g, re- 
garding them as equivalent , in which case his choice of f should not be 
regarded as significant If the person really does regard f and g as 
equivalent, that is, if he is indifferent between them, then, if f or g 
were modified by attaching an arbitrarily small bonus to its conse- 
quences in every state, the personas decision would presumably be for 
whichever act was thus modified This test foi indifference does not 
piovide an altogether satisfactory definition, since it begs the question 
to some extent by postulating in effect that the tester loiows what con- 
stitutes a small bonus Another attempted solution would be to say 
that the person knows by introspection whether he has decided hap- 
hazardly or in response to a definite feeling of preference This sort of 
solution seems to me especially objectionable, because I think it of 
gieat importance that preference, and indifference, between f and g be 
determined, at least in piinciple, by decisions between acts and not by 
response to introspective questions In spite of the difficulty of dis- 
tinguishing between preference and indifference, I think enough has 
been said for us to proceed to a postulational treatment of them. 

The very meaning of the relationship of preference that I have at- 
tempted to establish in the preceding paragraph implies that the per- 
son cannot simultaneously prefer f to g and g to f In the postulational 
treatment of the relationships of preference and indifference, it will be 
technically convenient to work with the relation “is not preferred to^^ 
rather than directly with its complementary relation “is preferred to 
Thus, rather than say that it is impossible that both f is preferred to 
g and g to f , I might say that, of any two acts f and g, f is not pi ef erred 
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to g or g IS not preferied to f, possibly both Again, the definition of 
preference suggests that, if f is not piefericd to g, and g is not pi ef erred 
to h, then it is impossible that f should be piefeired to h 

The two assumptions just made about the i elation “is not preferred 
to” IS sometimes expiessed m oidinary mathematical usage by saying 
that the relation is a simple ordeimg among acts Formally, a lelation 
<• among a set of elements x, i/j z , is called a simple ordering, in 
this Dook, if and only if for every x, y, and z, 

1 Either x <• y, or y <• x 

2 If a; < • ^, and y <• z, then x <• z 

Borrowing from arithmetic the suggestive abbreviation < for the re- 
lation ^iis not preferred to,” the assumption that < is a simple order- 
ing can be expressed formally by a postulate, thus 

PI The relation < is a simple ordeimg among acts 

It IS noteworthy that Pi makes no explicit leference to statics of (he 
woild Except possibly for mathematical refinements, f it seems i,o me 
that no additional postulates can be formulated without making md\ 
refeience — at any rate none will be m this book. 

PI by itself IS not veiy rich in consequences, })ui- one easily pioved 
theorem following from it may be mentioned 

Theorem 1 If F is a finite set of acts, thcie exist f and h in F such 
that for all g in F 

f <g<h 

Theoiem 1 is especially relevant to application of die theory ol d('- 
Gision, because I interpret the theoiy to imply tliat, il F is finite, the 
person will decide on an act h m F to which no otlier act m F is pre- 
ferred, the existence of at least one such h being guaianteed by the 
theorem 

It is often appiopiiate to consider infinite sets of available aid.s In 
economic contexts, for example, it is generally an mappiopnate com- 
plication to take explicit account of the possibility that all transactions 
must be in integral numbers of pennies. If infinite sets of available acts 
are set up and interpreted without some mathematical tact, unrealistic 
conclusions are likely to follow Suppose, foi example, that you were 
free to choose any income, piovided it be definitely less than $100,000 
per year Precisely which income would you choose, absti acting from 
the indivisibility of pennies'^^ 

t For example, such topological assumptionb about the space with neighboi hoods 
defined m terms of < as connectedness, local compaciin(\SKSs, oi density 
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It is sometimes convenient to supplement the relation < by other 
relations derived from it m accordance with the dej&mtions m Table 1, 
analogous definitions being applicable to any simple ordering The as- 
sumption of simple ordering, PI, has several implications for the de- 
rived relations >, <, >, and =. These are generally strongly sug- 
gested by the properties of the corresponding relations in arithmetic 


Table 1 Table of relations derived from < 


New Relation 


Definition 


f > g 

f < g, 1 e , g IS preferred to f 
f>g* 

f = g, 1 e , f IS eqtuvsflent to (or 
indifferent with respect to) g 

g IS between f and h 


g<f 

It IS false that g < f 
g<f 

f < g, and g < f 
f:^g^h, oih<g<f 


A few such implications of PI are listed below, with no intention of 
completeness, as exercises for those who may not already be familiar 
with the elementary properties of simple ordering. 

Exercises 

1. The relation > is also a simple oideimg 

2 All the relations <, >, <, >, and = are transitive, that is, they 
can be validly substituted for < in the second part of the definition of 
simple ordering 

3 Between any pair of acts f, g, one and only one of the three rela- 
tions <, =, and > holds 

4 If f < g, and g = h, then f < h 

5 If f = g, then g = f 

6 For any f , f = f 

7 At least one of three acts f, g, h is between the other two When 
can there be moie than one such? 

Two very different sorts of interpretations can be made of PI and 
the other postulates to be adduced later First, PI can be regarded as 
a prediction about the behavior of people, or animals, in decision situa- 
tions Second, it can be regarded as a logic-like criterion of consist- 
ency in decision situations. For us, the second interpretation is the 
only one of direct relevance, but it may be fruitful to discuss both, 
calling the first empirical and the second normative. 
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Logic itself admits an empirical as well as a noimative interpreta- 
tion Thus, if an experimental subject believes ceitam propositions, 
it is to^be expected that he will also believe then logical consequences 
and disbelieve the negations of these conseciuences This tlicoiy of hu- 
man psychology has some validity and is of gieat practi(‘al utility in oui 
everyday dealings with other people, though it is veiy crude and ap- 
proximate For one thing, people often do make elementary mistakes 
in logic, more lefined theories would attribute these misi^akes to such 
things as accident or subconscious motivation For another, if any- 
one who believed the axioms of mathematics also believed all that they 
im ply and nothing that they contiadict, mathematical study would be 
superfluous for him, such a person would, as has been explained, be 
able toi'state the ten-thousandth or any other teim m the decimal ex- 
pansion of T on demand To summaiize, logic can be interpreted as a 
crude but sometimes handy empiiical psyclfblogical theoiy. 

The principal value of logic, however, is in connection with its noima- 
tive interpretation, that is, as a set of (*nteria by which to detect, with 
sufiicient trouble, any inconsistencies tlieie may bo among our beliefs, 
and to derive fiom the beliefs we already hold such new on(‘H as con- 
sistency demands It does not seem approiiriate here to af.tempt an 
analysis of why and m what contexts we wish to bc^ consistent, it is 
sufficient to allude to the fact that we often do wish to be so 

Analogously, PI together with the postulai.es to be adduced later can 
be inteipreted as a crude and shallow empirical theoiy predicting the 
behavior of people making decisions This tluiory is piactical in suitably 
limited domains, and everyone in fact makes use of at least some as- 
pects of it in predicting the behavior of others At the same time, the 
behavior of people is often at variance with the theory. The depaitiae 
IS sometimes flagrant, m which case our attitude toward it is much like 
that we hold toward a slip m logic, calling the departure a mistake and 
attributing it to such things as accident and subconscious motivation 
Or, the departure may be detectable only by a long (‘ham oi aigument 
or calculation, the possibilities becoming increasingly complu'ated as 
new postulates are brought to stand beside Pi 

Pursuing the analogy with logic, the mam use 1 would make of PI 
and its successors is normative, to police my own decisions for consist- 
ency and, where possible, to make complicated decisions depend on 
simpler ones. 

Here it is more pertinent than it was m connection with logic that 
something be said of why and when consistency is a desideratum, though 
I cannot say much. Suppose someone says to me, am a rational 
person, that is to say, I seldom, if ever, make mistakes m logic But I 
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behave m flagrant disagreement with your postulates, because they vio- 
late my personal taste, and it seems to me more sensible to cater to my 
taste than to a theory arbitrarily concocted by you ” I don’t see how 
I could really controvert him, but I would be inclined to match his in- 
trospection with some of my own I would, m particular, tell him that, 
when it is explicitly brought to my attention that I have shown a pref- 
erence for f as compared with g, for g as compared with h, and for h as 
compared with f, I feel uncomfortable m much the same way that"*I do 
when it IS brought to my attention that some of my beliefs are logically 
contradictory Whenever I examine such a triple of preferences on my 
own part, I find that it is not at all difficult to reverse one of them. In 
fact, I find on contemplating the three alleged preferences side by side 
that at least one among them is not a preference at all, at any ijate not 
any more 

There is some temptatit)n to explore the possibilities of analyzing 
preference among acts as a partial ordering, that is, in effect to replace 
part 1 of the definition of simple ordering by the very weak proposition 
f < f, admitting that some pairs of acts are mcomparable This would 
seem to give expression to introspective sensations of indecision or vacil- 
lation, which we may be reluctant to identify with indifference My 
own conjecture is that it would prove a blind alley losing much m power 
and advancing little, if at all, m realism, but only an enthusiastic ex- 
ploration could shed real light on the question 

7 The sure-thing principle 

A businessman contemplates buying a certain piece of property He 
considers the outcome of the next presidential election relevant to the 
attractiveness of the purchase So, to clarify the matter for himself, 
he asks whether he would buy if he knew that the Republican candidate 
were going to win, and decides that he would do so Similarly, he con- 
sideis whether he would buy if he knew that the Democratic candidate 
were going to win, and again finds that he would do so. Seeing that he 
would buy in either event, he decides that he should buy, even though 
he does not know which event obtains, or will obtain, as we would ordi- 
narily say It IS all too seldom that a decision can be arrived at on the 
basis of the principle used by this businessman, but, except possibly 
for the assumption of simple ordering, I know of no other extralogical 
principle governing decisions that finds such ready acceptance 

Having suggested what I shall tentatively call the sure-thing prin- 
ciple, let me give it relatively formal statement thus: If the person 
would not prefer f to g, either knowing that the event B obtained, or 
knowing that the event obtained, then he does not prefer f to g 
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Moreover (provided he does not regard B as virtually impossible) if he 
would definitely prefer g to f, knowing that B obtained, and, if he would 
not prefer f to g, knowing that B did not obtain, then he dcfmitiely pre- 
fers g to f 

The sure-thing principle cannot appropriately be accepted as a postu- 
late in the sense that PI is, because it would introduce new undefined 
technical terms referring to knowledge and possibility that would len- 
der it mathematically useless without still moie postulates governing 
these terms It will be prefeiable to regard the principle as a loose one 
that suggests certain formal postulates well articulated with PI 

What technical interpretation can be attached to the idea that f 
would be preferred to g, if B weie known to obtain Under any rea- 
sonabler interpretation, the matter would seem not to depend on the 
values f and g assume at states outside of B There is, then, rw loss 
of generality in supposing that f and g agrefe with each other except m 
Bj that is, that/(s) = ^7(5) for all s e '^B Under this unrestrictive as- 
sumption, f and g are suiely to be regarded as ocfuivalent giv('n 
that is, they would be consideied equivalent, if it were known that B 
did not obtain. The first pait of the sure-thmg pniKiplo can now be 
intei preted thus If, after being modified so as to agree with one an- 
other outside of B, f is not prefeired to g, then f would not bo preferred 
to g, if B weie known The notion will be expressed formally by say- 
ing that f < g given B 

It IS implicit m the aigumcnt that has just led to the delimtion oi 
f g given B that, if two acts f and g aie so modified in as t<() agiee 
with each other, then the older of preference obtaining bet/Wcen the 
modified acts will not depend on which of the pemitted modifications 
was actually carried out Equivalently, it f and g ate two a.(^t.s that do 
agree with each other m and f < g; then, if f and g are modified 
in '^B in any way such that the modified acts f' and g' cont.mue io 
agree with each othei m ^B, it will also be so that f' < g'. Tins as- 
sumption IS made formally in the postulate P2 below and illustrated 
schematically m Figure 1, a kind of diagram I find suggestive m many 
such contexts 

In Figure 1, the set S of all states s and the set F of all consequences 
/ are repiesented by horizontal and vertical inteivals lespectively In 
any such diagram an act f, being a function attaching a value /(s) e F 
to each s e B is represented by a graph This particular diagram graphs 
two acts f and g that agree with each other m and two other acts 
f' and g' that also agree with each other in ^B and arise by modifying 
f and g respectively only in ^B, that is, acts agreeing with f and g 
respectively m B. 
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^ Figure 1 

P2 If f, g, and f', g' are such that: 

1 m f agrees with g, and f agrees with g', 

2‘ m B, f agrees with f', and g agrees with g', 

3 f<g; 

then f' < g'. 

Each of the relations given B^’ is now easily seen to be a simple 
ordering, and the relations <, >, = given B^^ are to be defined 
mutatis mutandis It is noteworthy though obvious that, if f(s) — g(s) 
for all s eBj then f = g given B 

It is now possible and instructive to give an atemporal analysis of 
the following temporally described decision situation The person must 
decide between f and g after he finds out, that is, observes, whether B 
obtains; what will his decision be if he finds out that B does in fact 
obtain 

Atemporally, the person can submit himself to the consequences of 
f or else of g for all s zB, and, independently, he can submit himself to 
the consequences of f or else of g for all s e '-^B, which alternative will 
he decide upon for the s’s m 

Finally, describing the situation not only atemporally but also quite 
formally, the person must decide among four acts defined thus: 

hoo agrees with f on JS and with f on ^B, 
hoi agrees with f on B and with g on 
hio agrees with g on B and with f on ^B, 
hii agrees with g on B and with g on ^B. 

The question at issue now takes this form. Supposing that none of 
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the four functions is preferred to the particular one is z = 0, or is 
z = 1, that is, does agree with f on B or with g on i5‘i* 

It IS not hard to see that z can be 1, if and only if f < g given B In- 
deed, if z == 1, hoj < which means that f < g given B Arguing in 
the opposite direction, if f < g given B, then hoo < hio, and hoi < hn. 
Suppose now, for definiteness, hio hnj then none ol the foiu possi- 
bilities IS pi ef erred to hn, this proves the point in question 

It maj^ fairly be said that the peison consideis B virtually impossible, 
01 that B is null; if and only if, for all f and g, f < g given B Indeed, 
if B IS null in this sense, the values acts take on elements of B arc irrele- 
vant to all decisions 

Several trivial conclusions about null events are listed as a compound 
theorem^, all components but the last of which have immediate intuitive 
interpretations 

«■ 

Theorem 1 

1 The vacuous events, 0, is null 

2 B is null, if and only if, for every f and g, f g giv(Ui B 

3 If B IS null, and B 3 C, then 0 is null 

4 If ^B IS null; f < g given B, if and only il f < g. 

5 f < g given S, if and only if f < g 
G If B IS null, f = g for every f and g 

Component 6 of Theorem 1 requues comment, beca.use it (‘orr(^spon<ls 
to a pathological situation In case B is null, it is not nailly infantive 
to say that S (and therefoie every event) is vii tonally impossiblti The 
interpretation is rather that the poison simply doesn't caie what luii)- 
pens to him This is imaginable, especially undei a suitably restri(*,tod 
interpretation of F, but it is uninteresting and will accordingly be ruled 
out by a later postulate, P5 

A finite set of events B* is a partition of B, if B^ (1 B^ — 0, for z ji, 
and \Ji,B^ - B With this definition, it is easily proved by arithmetic 
induction that 

Theorem 2 If B^ is a partition of B, and f < g given B^ for each z, 
then f < g given B If, in addition, f < g given Bj for at least one j, 
then f < g given B 

Corollary 1 The union of any finite number of null events is null. 

There are still other interesting consequences of Theorem 2, which 
may be most conveniently mentioned informally. If, m Theorem 2, 
B == S (or, more generally, if is null), it is superfluous to say ^‘given 
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S’’ in the conclusions of the theorem. If f = g given for each z, 
then f = g given B So much for the consequences of P2. 

Acts that are constant, that is, acts whose consequences ar« inde- 
pendent of the state of the world, are of special interest In particular, 
they lead to a natural definition of preference among consequences in 
terms of preference among acts Following ordinary mathematical us- 
age, i ^ g will mean that f is identically g, that is, for every s, f{s) == g 
A formal definition of preference among consequences can now tjon- 
veniently be expressed thus For any consequences g and g', g < g' , 
if and only if, when f = g and f' = f < f'. 

In the same spirit, meaning can be assigned to such expressions as 
i g, g ^ i given Bj etc , and I will freely use such expressions without 
defining them explicitly. In particular, f < g given B has a patural 
meanmg, but one that is rendered superfluous by the next postulate, 
P3 

Incidentally, it is now evident how awkward for us it would be to 
use/(s) for f, because /(s) < g(s) is a statement about the consequences 
/(§) and g{s), whereas f < g is a statement about acts, and we will 
have frequent need for both sorts of statements 

Suppose that f ^ and f ^ g\ and that g < g'j is it reasonable to 
admit that, for some 5, f > f' given B? That depends largely on the 
interpretation we choose to make of our technical terms, as an example 
helps to bung out 

Before going on a picnic with friends, a person decides to buy a 
bathing suit or a tennis racket, not having at the moment enough money 
for both If we call possession of the tennis racket and possession of 
the bathing suit consequences, then we must say that the consequences 
of his decision will be independent of where the picnic is actually held. 
If the pel son piefers the bathing suit, this decision would presumably 
be reversed, if he learned that the picnic were not going to be held 
near water Thus the question whether it can happen that f > f' 
given B would be answeied in the affirmative. But, under the interpre- 
tation of ^^act” and ' ^consequence” I am trying to formulate, this is 
not the correct analysis of the situation The possession of the tennis 
racket and the possession of the bathing suit are to be regarded as acts, 
not consequences (It would be equivalent and more in accordance 
with ordinary discouise to say that the coming into possession, or the 
buying, of them are acts ) The consequences relevant to the decision 
are such as these a refreshing swim with friends, sitting on a shadeless 
beach twiddling a brand-new tennis racket while one’s friends swim, 
etc It seems clear that, if this analysis is carried to its limit, the ques- 
tion at issue must be answered m the negative, and I therefore propose 
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to assume the negative answer as a postulate The postulate is so 
couched as not only to assert that knowledge of an event cannot estab- 
lish a new prefeience among consequences oi icverse an old one, but 
also to assert that, if the event is not null, no preference among conse- 
quences can be reduced to indiffeience by knowledge of an event; 

P3 If f ^ < 7 , f' ^ g'j and B is not null, then f < f' given />, if and 
only^if g < g' 

Applying Theoiem 2, it is obvious that 

Theorem 3 If is a partition of B; and if (for all t and s) < g^, 
A^) ~ A) and g(s) = g^ when s eB^, then f < g given B If, in addi- 
tion, fj < gj for some j for which Bj is not null, then f < g 

c 

Theorem 3 is logically equivalent to P3 in the presence of PI and P2, 
and Theorem 3 can as easily be given an intvitive basis as the po.<iiilate 
P3 Therefore the assumption of P3 as a postulate instead of Theoiem 
3 is only a matter of taste 

Theorem 3 has been widely accepted by the lintish-Ameiican School 
of statisticians, special emphasis having been given to it, in connection 
with his notion of admissibility, by the late Abiaham Wald. I behove, 
as will be more fully explained latci, that much of its })artu;ular sig- 
nificance foi that school stems fiom the implaiation that/, il seveial 
different people agree in their prcicrcnccs among (uinsccpumci's, then 
they must also agree m theii piclerenccs among cert/ain act/S 

This brings the piescnt chaptei to a naduial (uuudusion, since f/he 
further postulates to be proposed can be more conveniently introduced 
in connection with the uses to which they aie put m later chapt/crs. 
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Personal Probability 

1 Introduction 

I personally consider it more probable that a Republican president 
will be elected in 1996 than that it will snow in Chicago sometime in the 
mon^ of May, 1994. But even this late spring snow seems to me more 
probable than that Adolf Hitler is still alive. Many, after careful con- 
sideration, are convinced that such statements about probability to a 
person mean precisely nothing, or at any rate that they mean nothing 
precisely. At the opposite extreme, others hold the meaning to be so 
self-evident as to be unanalyzable An intermediate position is taken 
m this chapter, where a particular interpretation of probability to a 
person is given in terms of the theory of consistent decision in the face 
of uncertainty, the exposition of which was begun in the last chapter. 
Much as I hope that the notion of probability defined here is consistent 
with ordinary usage, it should be judged by the contribution it makes 
to the theoiy of decision, not by the accuracy with which it analyzes 
ordinary usage 

Perhaps the first way that suggests itself to find out which of two 
events a person considers more probable is simply to ask him It might 
even be argued, though I think fallaciously, that, since the question 
concerns what is inside the personas head, there can be no other method, 
just as we have little, if any, access to a personas dreams except through 
his verbal report Attempts to define the relative probability of a pair 
of events in terms of the answers people give to direct interrogation 
has justifiably met with antipathy from most statistical theorists. In 
the first place, many doubt that the concept “more probable to me 
than^^ IS an intuitive one, open to no ambiguity and yet admitting no 
further analysis Even if the concept were so completely intuitive, 
which might justify direct interrogation as a subject worthy of some 
psychological study, what could such interrogation have to do with the 
behavior of a person m the face of uncertainty, except of course for his 
verbal behavior under interrogation? If the state of mind m question 
is not capable of manifesting itself in some sort of extraverbal behavior, 
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it IS extraneous to our mam interest If, on the other hand, it does 
manifest itself thiough more material behavior, that should, at least 
in pru^ciple, imply the possibility of testing whether a person holds 
one event to be more probable than another, by some behavior express- 
ing, and giving meaning to, his judgment. It would, in short, be pief- 
erable, at least m principle, to interrogate the peison, not literally 
through his verbal answer to verbal questions, but rather in a figurative 
sense somewhat lemimscent of that m which a scientific experiment is 
sometimes spoken of as an interrogation of nature Several schemes of 
behavioral, as opposed to diiect, interrogation have been proposed 
The one introduced below was suggested to me by a passage of de Fi- 
nettfis (on pp 5-6 of [D2]), though the passage itself does not empha- 
size behavioral interrogation 

To illustrate the scheme, our idealized person has just taken two 
eggs fiom his icebox and holds them unbroken in his hand. We wonder 
whethei he thinks it moie piobable that the brown one is good than 
that the white one is. Our (miiosity being real, we aie prepaied to 
pay, li necessary, to have it satisfied We thorcloic addnsss lum thus: 
“We sec that you are about to open those eggs If you will be so co- 
opeiative as to guess that one or the other egg is good, we will pay you 
a dollai, should your guess prove cortcct If niconect, you and we 
arc quits, except that we will in any event exchange your two eggs for 
two ol guaianteed goodness.^^ If undci these eiicumsi/anccss the peu’son 
stakes his chance lor the dollai on the brown (^gg, it seems to me to 
coi respond well with oidmaiy usage to say that it is moic probable to 
him that the brown one is good than that the white one is. lliough, 
of course, f hope foi your agi cement on this analysis of ordinary usage, 
I repeat that it is not really fundamental to the subsequent argument, 
as indeed no such lexicogiaphical point could be; for the utility of a 
construct oi definition depemds only secondarily on the aptness of the 
expression in terms ot whuih it is couched. 

There is a mode ol interiogation intermediate between what 1 have 
called the behaMoral and the direct One can, namely, ask the person, 
not how he feels, but what he would do m such and such a situation 
In so far as the theory of decision under development is regarded as 
an empirical one, the intci mediate mode is a compromise between econ- 
omy and rigor But, in the theory ^s moie important normative inter- 
pretation as a set of criteria of consistency foi us to apply to our own 
decisions, the intermediate mode seems to me to be just the right 
one 

Though it entails digression from the main theme, some r(xid(as may 
be interested m a few words about actual experimentation on strictly 
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empirical behavioral interrogation Some key references bearing on 
the subject are [M4], [R3], and [W8] 

In the first place, a little reflection shows that an experiment id which 
human subjects are required to decide among actual acts may be very 
expensive m time, money;, and effort, especially if the consequences en- 
visaged are expensive to provide, a point discussed in detail m [W8] 
Questions of morality, and even of legality, towaid the subject ^ay 
further complicate the investigation For example, Hosteller and No- 
gee, as described in Section 3B of [M4], made certain that every sub- 
ject in one experiment of theirs would be financially benefited, though 
they kept this security secret from the subjects. 

There is also a difficulty in principle Suppose that I wish to dis- 
cover a person’s preferences among several acts — three acts f, g* and h 
are srfS.cient to bring out t|je difficulty If I in good faith offer him the 
opportunity to decide among all three, and he decides on f , then there 
IS no further possibility of discovering what his preference was between 
g and h Suppose, foi example, that a hot man actually prefers a swim, 
a shower, and a glass of beer, in that older Once he decides on, and 
thereby becomes entitled to, the swim, he can no longer appropriately 
be asked to decide between shower and beer A naive attempt to do so 
would result in his deciding between a swim and shower on the one 
hand, and a swim and beer on the other — an altogether different situa- 
tion from the one intended 

The difficulty can sometimes be met by special devices Foi example, 
the investigatoi might wait for a different but ^^similai” occasion. But 
W Allen Wallis has mentioned to me an interesting and very general 
device, which will now be desciibed, with his permission f 

Suppose that the hot man is instructed to rank the three acts in 
order, subject to the consideration that two of them will be drawn at 
random (eg, by card drawing or dice rolling), and that he is then to 
have whichever of these two acts he has assigned the lower rank He 
IS thus called on to select one of six acts, that is, one of the six possible 
rankings If he does, for example, select the ranking {swim, shower, 
beer}, it follows easily from the theory of decision thus far developed 
that for him swim > shower > beer, barring the farfetched possibility 
that he regards one or moie of the three drawmgs as virtually impossi- 
ble and provided that his preference among the three acts swim, shower, 
beer given any of the three drawings is the same as his original prefer- 
ence The investigator could m practice design the drawing in such a 


1 1 have since seen this same device used by M Allais 
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way as to be well satisfied that the required “ii relevance’’ obtained, ex- 
cept for very ^ 'superstitious” people This ends the present digression on 
actual behavioral interrogation. 

The purpose of this chapter is to exploie the concept of personal 
probability t that was indicated m the example about the two eggs 
The concept will be put on a formal basis m § 2 by introducing two new 
postulates, P4 and P5, to be used in conjunction with Pl-3 This will 
lead to a formal analysis of the notion that one event is no more piob- 
able than another Several deductions about this notion reminiscent 
of mathematical properties ordinarily attributed to probability will be 
made, but only m §3, after adjunction of still another postulate, P6, 
can the notion be connected quantitatively with what mathematicians 
ordinardy call mathematical probability Section 4 is devoted to some 
mathematically technical criticisms of the notion of personal gfoba- 
bihty, which can safely be skipped or skimnJed by those not interested 
m such matters Section 5 discusses conditional peisonal piobability, 
6, the approach to certainty through a long scquencjo of condii-ionally 
independent relevant observations, and 7, an extension of the concept 
of a sequence of independent events, particulaily interesting from the 
viewpoint of peisonal probability 

2 Quahtative personal probability 

When I spoke m the mtioductory section of offering the person a 
dollar if his guess about the egg proved coire(‘,t, it was ta(it-]y assumed 
that his guess would not be affected by the amount of the prize olTered 
That seems to me correct m piinciple It would, lor example, seem un- 
reasonable for the person with the two eggs iio reverse his decision il 
the prize were reduced from a dollar to a penny lie might leverse 
himself m going from a penny to a dollar, because he might, not have 
found it worth his trouble to give careful consideration lor too small a 
prize I think the anomaly can best be met by deliberately piet.endmg 
that consideration costs the person nothing, though that is far from the 
truth in actual complicated situations It might, on the other hand, 
be stimulating, and it is certainly more realistic, to think of considera- 
tion or calculation as itself an act on which the peison must decide 
Though I have not explored the latter possibility carefully, I suspect 
that any attempt to do so formally leads to fruitless and endless re- 
gression. 

t The term '^personal probability’' was suggested to me orally by Thornton C. 
Fry Some other terms suggested for the same concept aie ^'subjective probability,” 
"psychological probability,” and "degree of conviction ” 
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To offer a prize m case A obtains means to make available to the per- 
son an act such that 

fA(s)=f forssA, 

/a(s) = f for s e 

where f < f The assumption that on which of two events the person 
will choose to stake a given prize does not depend on the prize Itself 
is expressed by the following postulate, which looks formidable only 
because it contains four definitions like (1) The reader may find it 
helpful to graph an instance of the postulate m the spirit of Figure 
271 

P4 If /, f, g, g'; A, B, g^, gs are such that: 


1 

/'</, • 

g' < 9, 


2a. 

/a(s) = /, 

9a(s) = g 

for s e A, 


A(s) = r, 

gA{s) = g' 

for s e ^A ; 

2b. 

Jb(s) = f, 

gB(s) = g 

for s e B, 


fB(s) = 

gB(s) = g' 

for s £ ^B; 

3. 


VI 



then gA < gB- 

In the light of P4, it will be said that A is not more probable than 
B, abbreviated A < B, if and only if when f <f and are such 

that 


fAis) 

= f 

for s £ A, 

fAis) =/' 

for s e ^Aj 

fsis) 

= / 

for s £ B, 

fsis) =/' 

for s £ ^B , 


then f^ < f^ 

The assumption that there is at least one worth-while prize is in- 
nocuous, for, though a context failing to satisfy it might arise, such a 
context would be too trivial to merit study. I therefore propose the 
following postulate 

P5 There is at least one pair of consequences /, f such that f < /. 

All the implications to be deduced from Pl-5 for some time to come 
are themselves implications of the thiee easily established conclusions, 
which are introduced by the following definition and theorem. 
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A relation <• between events is a qualitative probability; if and only 
if, for all events B, C, D, 

C 

1 < • is a simple ordering, 

2 B <>Cj if and only if B U D <* C D D, provided B f\ D 

c n Z) - 0 , 

3 0 <-B ,0 <-S 

It may be helpful to remark that the second pait of i-he above defini- 
tion says, in effect, that it will not affect the personas guess to offer 
him a consolation prize in case neither B nor C obtains, but D happens 
to 

Theorem 1 The relation < as applied to events is a qualitative 
probability 

You will have no difficulty in proving th'at Theorem 1 follows from 
Pl~5 Theoicm 1 has many consequences of the sort one would expect 
if < meant “not more probable than” in any sense having the mathe- 
matical properties ordinarily attnbutcd to numerical probability lliis 
is illustrated by the following list of exciciscs, which should not only 
be proved formally, but also interpreted intuitively One ea,sy exercise 
not included in the list below, because it is not strictly a conseiiuence 
of Theorem 1 alone, is to show that Z? = 0, if and only if B is a null 
event 

Exercises 

1 . If B c C, then 0 < B < C < S. 

2 a If B n I> = C n D = 0 ; then B < C, if and only if B II D < 
CUD 

2 b. If 0 < C, and B n C = 0 ; then B < B U C. 

3 . If B < C, then < ~B; and conversely. Hint: Draw a Venn 
diagram of the fourfold partition B fl C, ~B (1 (7, B (1 ~B fl 

4a If B < C, and C C\ D = 0, then B U B < C U B. 

4b If B < 0; then B U (7 = C, and B = 0 

4c. If B < B, then B H C = C, and B = B. 

4d If B U B < C U B, and B n B = 0 ; then B < C. 

5a. If Bi < Cl, Ba < Ba, and Ci f\ C 2 = 0; then Bi U Ba < Bi U 
Ba. Hint: Exhibit Ba and Bi m the fom B 3 = B 3 ' U Q, Bi = Ci U Q 
with Ba', Cl, Q disjoint Justify the following calculation, step by step. 

Bi U Ba' < Bi U Ba' = Ci' U Ba < Bi' U Ba, 
whence Bj U B 3 < Bi U Ba. 
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5b. If U ^2 < Cl U C 2 and Bi f\ B 2 = 0; then Bi < Ci or 

B2 < C2 

6 If JS < and C > '^C, then B < C; equality holding in the 

conclusion, if and only if it holds m both parts of the hypothesis. 

3 Quantitative personal probability 

As I have said, the exercises terminating the preceding section sug- 
gest a close mathematical parallelism between personal probability and 
the mathematical properties ordmarily attributed to probability, though 
the postulates assumed thus far do not (as could easily be demonstrated) 
make it possible to deduce from this parallelism the unambiguous as- 
signment of a numerical probability to each event But, if, for example 
(following de Finetti [D2]), a new postulate asserting that Si can be 
par^tioned into an arbitrarily large number of equivalent subsets were 
assumed, it is pretty cle^r (and de Finetti explicitly shows in [D2]) 
that numerical probabilities could be so assigned. It might fairly be 
objected that such a postulate would be flagrantly ad hoc On the 
other hand, such a postulate could be made relatively acceptable by 
observing that it will obtain if, for example, in all the world there is a 
com that the person is firmly convinced is fair, that is, a com such that 
any finite sequence of heads and tails is for him no more probable than 
any other sequence of the same length, though such a com is, to be sure, 
a considerable idealization. 

After some general and abstract discussion of the mathematical con- 
nection between qualitative and quantitative probability, a postulate, 
P6, will be proposed, which, though logically actually stronger than the 
assumption that theie are partitions of S into equivalent events, seems 
to me even easier to accept Once P6 is accepted, there will scarcely 
again be any need to refer directly to qualitative probability 

To begin with, let me say precisely what is meant, m the present 
context, by a probability measure, this being the standard term for 
what I would here otherwise prefer to call a quantitative probability, 
and what it means for a probability measure to be in agreement with 
a qualitative probability 

A probability measure on a set >8 is a function P{B) attaching to 
each B d S Si real number such that 

1 P{B) > 0 for every J5. 

2 If S n C = 0, P(B U C) = P(B) + P(C) 

3. P(S) = 1 

This definition, or something very like it, is at the root of all ordinary 
mathematical work m probability. 
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If S carries a probability measure P and a qualitative probability 
<• such that, for every J5, (7, P(B) < P(C), if and only if B <• C; 
then F (strictly) agrees with <• If B <• C implies P{B) < P(C), 
then P almost agrees with <• This terminology is obviously con- 
sistent in that, if P agiees, that is, strictly agrees, witli <*, P also al- 
most agrees with <• It is also easily seen that, it P agic(\s with <•, 
then knowledge of P implies knowledge of <•. But, if P only almost 
agrees with <*, it may happen, as examples in § 4 show, Uuit P(P) = 
P(C), though B <• C, so that knowledge ot P may imply only impeifect 
knowledge of <• 

The rest of this section is mainly a study of qualitative probabilities 
generally, with a view to discovering inteiestmg conditions under which 
there is- a probability measure that agrees, either strictly or almost, 
with a given qualitative probability These conditions suggest ^.new 
postulate governing the special qualitative probability < The work 
is necessanly lathei tedious and burdened with detail li/ will, there- 
fore, be wise for most readeis to skim ovei llio material, omitting the 
proofs but noticing the more obvious logical conne(‘,tions among the 
theoiems and definitions Some may then find themsc'lves sulficiently 
interested m the details to icturn and rea,d or supply tlu‘, pi oofs, as the 
case may require Others may safely go forward Jferi^, as (elsewhere, 
technical teims of mteiest for the moment only are introduced with 
italics lather than boldface 

An n-fold almost unijorm partition of B is an //-fold piutition of B 
such that the union of no r elements of the partition is moie probable 
than that of any r + 1 elements 

Tiikohem 1 If thcie exist r/.-fold almost umfoim padations of B for 
arbitrarily laige values of /^, then thcie exist ///-fold almost umfoim par- 
titions for every positive integer m 

Proof. Let B^, ^ = 1, , n, be an /?~fold almost uniform paitition 

(of B) with 11 > in^ Using the euclidean algorithm, let ri bo written 
n — am + wheic a and h arc integers such that m < a and i) <h < 
m Now let Cj, ,7 = 1, , w, be any w4oId paitition su(;h that each 

Cj is the union of a or a + 1 of the P/s Tlie union of any r of the C/s, 
T < m, IS the union of from ar to (a + l)r of the P/s and the union of 
r + 1 of the C/s is that of from a{r H- 1) to {a + l){r + 1) of the BtS 
Since r < m < a, {a + l)r - ar + r < ar + a = a{r + 1). ♦ 

Theorem 2 If theic exist n-fold almost umfoim partitions of S for 
arbitiarily laige values of n, then there is one and only one piobabihty 
measure P that almost agrees with <• Fuithermore, for any p, 0 < p 
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< 1, any B d S, and the unique P just defined, there exists C d B 
such that P(C) = pP(B) f 

Proof. The proof is broken into a sequence of easy steps, left, for 
the most part, to the reader These steps are grouped m blocks, only 
the last step in each being needed in the proof of later steps 

1 There exist n-fold almost uniform partitions of B for every posi- 
tive n 

2a If pi, • , Pn are real numbers such that 0 < pi < p 2 < • < Vn, 

and Sp^ = 1 , then 

r 

(1) Y^P%< r/n, r = 1, • ■,n. 

2b If further ^ 

•n r-\-l n 

L Pi > Pt for r = 1, ••,w— 1, 

1 n— r-f-1 

then 

r n 

(2) J1Pz> (r - l)/n, and P% < (r + l)/n 

1 n— ? -1-1 

2c. The sum of any r of the p^s lies between (r — l)/n and (r + \)ln 
2d If P almost agrees with <•, and C{t, n) denotes here and later 
in this pi oof any union of r elements of any n-fold almost uniform par- 
tition (not necessarily the same from one context to another), then 

(3) - l)/n < P{C(r, n)) < (r + l)/n 

3 Let k(B, n) denote the largest integer r (possibly zero) such that 
some C(r, n) is not more probable than B The function k(B, n) is 
well-defined, and 0 < k(B, n) <n 

4a For any P that almost agrees with <•, 

(4) (/c(P, n) - l)/n < P{B) < {k{B, n) + 2)/n, 

4b At most one P can almost agree with <•. 

5a If B^ and are n-fold partitions (not necessarily almost uniform) 
so indexed that Bi <• B 2 <* • - <• Bnj and Ci >• C 2 >• • • • >• Cn] 
then 

n n 

(5) U U = 0- 

n—r n — r 

t Technical note The mathematical essence of the ternoinal conclusion of this 
theorem, and other conclusions related to it, are given by Sobczyk and Hammer 
[SI 5] It might be conjectured, in analogy with countably additive measures, that 
this conclusion means only that P is non-atoimc, but that conjecture is false [N5] 
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5b If in addition the two partitions are almost uniform, then 

r r-f2 

(6) • \JC,<-\J B,, r = I, • , n - 2 

1 1 

(Proof U >• U U >• U ) 

1 n — r 71 — T 1 

5c-The union ot any ? elements of one almost unifoim partition is 
not more piobable than the union of any r + 2 elements of anotlici. 

5d If 5 n C = 0, then 

(7) fc(5, n) + k{C, n) - 2 < k(B U C,n) < L(B, n) + k(C, ?^) + 1 
6a If^a C(r, m) is not moie piobable than a C(s, n), then 



(Consider an m/?.-fold almost unifoim pari^ition, and use the c'a-sily es- 
tablished fact that the union of any i + 2 elennmt.s of an almost um- 
foim paitition is actually more piobable than that of any t elements ) 


f)b 


k(Bj m) 

771 


71 


3 3 1 

< — p ^ . 

771 7h 77UI 


6c It IS meaningful to define I\B) by 

A(/>', 77 ) 

(9) P{B) -lu hm ~ » 

- ► 00 /?- 

that IS, the limit exists 

7. P(/?), as just defined, is a probability measure, and the only one 
that almost agrees with <*, 

8a, There exist two infinite sequences of set-s Cn and Dn (umtamed 
in B such that 

1 Cr, n Dn = 0, 

2 C/fi d and Dn d Z),, 

3. P(Cn) > pP(B) - 

4 P(Dn) > (I - p)P(B) - n-' 

8b P(Un(?«) > pPiB), PiUnDn) > (1 - p)P(P), and (UnC^) n 
i\JnDn) = 0 

8c P(UnPn) = pP{B) ♦ 

A few technical terms of localized interest only arc now introduced 
If and only it, for every B > * 0, there is a partition of S, no element of 
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which is more probable than B; <• is fine, B and C are almost eqmva- 
lentj written 5 =c= • C ; if and only if for all non-null G and H su( 5 Ji that 
E n (? = C n = 0, £ U (? >• C and C U i7 >• 5 It is obvious 
that equivalent events are also almost equivalent Fmally, if and only 
if every pair of almost equivalent events are equivalent, <• is tight 

Theorem 3 

Hyp. < • is fine 

CoNCL 1 If ^ >*0, and (7 >• 0, there exists D d C such that 
0 <-D <-B 

2. If 5 o. (?, C o. i?, and B 0 C = G fl i? = 0; then BUG 
c=.G U F 

3 If J5 =c^. C, G B U C G U il, and 5 n C = G ITH = 0; 
thei?"B =0: • G 

4 Any partition of S into almost equivalent events is an almost uni- 
form partition 

5 Any event can be partitioned into two almost equivalent events 

6 Any event can be partitioned into 2" almost equivalent events, 
for any non-negative integer n 

7. There exists one and only one P that almost agrees with <• 
For any P, p (0 < p < 1), and the umque P just defined, there ex- 
ists C czB such that P(G) = pP(S) If P >• 0, P(P) > 0 Finally, 
P G, it and only if P(P) = P(C). 

Proof The paits of the conclusion are so arranged that each is easy 
to prove in the light of its predecessors, but proofs for Parts 3 and 5 
are given below It may be remarked that all parts are trivial conse- 
quences of the last one and have therefore relatively little importance in 
themselves 

Part 3 Suppose, for example, PUP<*G, PnP = 0, and 
P > • 0 , and consider two cases 

(a) It P U G <*/S, it may be assumed without loss of generality 
that G n P = 0, whence (PUG)UP>-GUP Therefore, O-H 

Let P be partitioned into two non-null events Pi and P 2 ,* then (since 
it IS absurd to suppose that the part of G outside of G is null, which 
would imply G >• G >• P U P) there is in G an E' such that GRP' 
= 0 <-P' <-P 2 Now GUP' >-H U P' >-G >• (P U Pi) U P 2 , 
whence G >• P U Pi, which is absurd, 

(b) If P U G = • P, it can (setting aside the easy special case G fl G 
= •0) be shown successively that* PT U G =*P; G <'P U P <-G, 
where P >• 0 and P e G R G, (P R H) U P <• (G R G), (GRP) 
< - (G R P) ; and P U P < • G, which establishes a contradiction 
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Part 5. There exists a sequence of threefold partitions of B, say 
Cn, -Dp, and (?„, such that: 

1. Ci U Gn >• Dn, and U (?„ >• (7„, 

^ cn G nj 

3. n Gn >*Gn+i; whence Gn conUunfi two disjomi events 

each at least as probable as Gn+i. 

Fo^r any H >• 0, Gn <• H for sufficiently huge n, as may be seen by 
considering some m-fold partition no element of which is more probable 
than //, and letting n be such that > m If Gn were more pi obable 
than H and therefore more probable than each element of the paitition, 
it would follow that the union of all elements of the paitition, namely 
Sj IS lesg probable than Gi, which would be absurd 

The two events Bi == IJnGn, B 2 == (U^^n) U (PlnG'n) partitions 
m the required fashion ♦ • 

Corollary 1 If <• is both fine and tight, the only proliability 
measure that almost agrees with <• strictly agrees wiUi it, and theie 
exist partitions of S into aibitiaiily many eciuivaJcnt events 

Theorem 4 <• is both line and tight, it and only if, for every B <• G, 

there exists a partition of S the union of ca(‘h clement oi which with B 
is less probable than G. 

The proof of this theoiem is easy. 

In the light of Theorems 3 and 1, I tentatively propose the following 
postulate, P6', governing tlie relation < among events, and thereby 
the relation < among acts 

P6' If jB < G, there exists a paitition of S the union of each ele- 
ment of which with B is less probable than G 

It seems to me rather easier to justify the assumption of PG', which 
says in effect that < is both fine and tight, than to justify the assump- 
tion, which was made by do Fmetti [D2] and by Koopman [K9], [KIO], 
[Kll] in closely related contexts, that theio exist partitions of B into 
arbitrarily many equivalent events, though logically PG' implies that 
assumption and somewhat more Suppose, for example, that you your- 
self consider B < C, that is, that you would definitely rather stake a 
gain in your fortune on G than on B Consider the partition of your 
own world into 2”' events each of which corresponds to a paiiicular 
sequence of n heads and tails, thrown by youisclf, with a coin of your 
own choosing. It seems to me that you could easily choose such a 
coin and choose n sufficiently large so that you would continue to pie- 
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fer to stake your gain on C, rather than on the union of B and any par- 
ticular sequence of n heads and tails. For you to be able to do so, you 
need by no means consider every sequence of heads and tails equally 
probable 

It would, however, be disingenuous not to mention that some who 
have worked on a closely related concept of probability, notably Keynes 
[K4] and Koopman [K9], [KIO], [Kll], would object to P6' piecisely 
because it implies that the agreement between numerical probability 
and qualitative probability is strict Koopman, for example, holds 
that, li A ZD B and A 9 ^ B, then A is necessarily more probable than 
jB, though the numerical probability of A may well be the same as that 
of jB. Thus, if a marksman shoots at a wall, it is logically contradictory 
that his bullet should fall nowhere at all, but it is logically consistent 
that a prescribed mathematically ideal point on the bullet should strike 
a prSscribed mathematically ideal line on the wall Since the event of 
the prescribed point hitting a prescribed Ime is logically possible, Koop- 
man would insist that the event is more probable than the vacuous 
event, namely that the bullet goes nowhere, though the numerical proba- 
bility of both events is zero I do not take direct issue with Koopman, 
because he is presumably talking about a somewhat different concept 
of probability from the particular relation < , but I do not think it 
appropriate to suppose that the person would distinctly rather stake a 
gam on the line than on the null set The issue is not really either an 
empirical or a normative one, because the point and line m question 
are mathematical idealizations If the point and line are replaced by a 
dot and a band, respectively, then, of course, no matter how small the 
dot and band may be, the probability of the one hitting the other is 
greater than that of the vacuous event But it seems to me entirely 
a matter of taste, conditioned by mathematical experience, to decide 
what idealization to make if the dot and band are replaced by their ideal- 
ized limits So much for hair splitting 

As far as the theory of probability per se is concerned, postulate P6' 
is all that need be assumed, but in Chapter 5 a slightly stronger assump- 
tion will be needed that bears on acts generally, not only on those very 
special acts by which piobability is defined Therefore, I am about to 
propose a postulate, P6, that obviously implies P6' and will therefore 
supersede it. This stronger postulate seems to me acceptable for the 
same reason that P6' itself does 

P6 If g < h, and / is any consequence; then there exists a parti- 
tion of S such that, if g or h is so modified on any one element of the 
partition as to take the value / at every s there, other values being un- 
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disturbed, then the modified g remains less than h, or g remains less 
than the modified h, as the case may require 

4 Some mathematical details 

Are there qualitative probabilities that are both fine and tight, that 
are fine but not tight, that are tight but not fine, that are nciihex fine 
nor tight but do have one and only one almost agiecing probability 
mea^are'f' Examples answeimg all these questions in the affirmative 
will be exhibited in this section 

To indicate a different topic that will also be treated heic, those of 
you who have had more than elementaiy experience with mathematical 
treatments of probability know that it is not usual to suppose, as has 
been done here, that all sets have a numerical probability, but rather 
that a sufficiently rich class of sets do so, the remainder being consid- 
ered unmeasuiable. Again, it is usual to suppose that, if each di an 
infinite sequence of disjoint sets is measurable, the probability of their 
union is the sum of their probabilities, that is, probability measures 
are geneially assumed to be countably additive. But the iiieory being 
developed here docs assume that probability is defined foi all events, 
that IS, for all sets^of states, and it docs not imply coimtxiblo additivity, 
but only finite additivity The present section not only answcis the 
questions raised m the picceding paragraph, but also discuss(\s the re- 
lation of the notions of limited domain of definition and of (‘.ounirrble 
additivity to the theory ot piobabihty devclojicd here Th(^ general 
conclusions of this discussion are. First, there is no (,('(iuncal obsta,(ie 
to woiiang with a limited domain of definition, and, ('.\eepi/ foi exposi- 
toiy complications, it might have been mildly pi(‘lerai)le to hav(^ done 
so throughout Second, it is a little better not, to assume countable 
additivity as a postulate, but lather as a special hypothesis m cert.ain 
contexts A diffeicnt and much more extensive tieatmcnt of these 
questions has been given by do Finetti [1)1] 

Finally, before enteiing upon the main tecimieal work of this see.- 
tion, one easy question about the lelation between qualltatue and 
quantitative probability will be answered and scveuil as ycit unansweied 
ones will be laised 

Are there qualitative probabilities without any strictly agreeing meas- 
ure^^^ Yes, because any qualitative piobabihty that is fine but not 
tight IS easily shown to provide an example It is, however, an open 
question, stressed by de Finetti [D5], whether a qualitative piobabihty 
on a finite S always has a stiictly agieeing measme It would also be 
technically interesting to know about the existencic of almost agreeing 
measures in the same context. 
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Are there qualitative probabilities without any almost agreeing meas- 
ure? I do not Icnow 

The matters to be treated in the rest of this section aie rather tech- 
nical mathematically, and, though I would not delete them altogether, 
it does not seem justifiable to lay the necessary groundwork for pre- 
senting them in an elementary fashion Some may, therefore, find it 
necessary to skip the rest of this section altogether, or to skim it rather 
lightly 

It is well known that theie does not exist a countably additive proba- 
bility measure defined for every subset of the unit interval, agreeing 
with Lebesgue measure on those sets wheie Lebesgue measure is de- 
fined, and assigning the same measure to each pair of congruent sets 
(Theorem 41 , p 276 of [H 2 ]) On the other hand, there do exist 'finitely 
addiWve probability measures agreeing with Lebesgue measure on those 
sets for which Lebesgue measure is defined, and assigning the same 
measure to each of any pairs of congruent sets, cf p 32 of [B 4 ] The 
existence of such measuies shows, among other things, that a finitely 
additive measuie need not be countably additive Again, calling such 
a finitely additive extension of Lebesgue measure P and defining B <• C 
to mean P(B) < PiC), we see an example of a qualitative probability 
that is both fine and tight 

An example of a qualitative probability that is tight but not fine may 
be constructed by taking for S two unit intervals, Si and 82^ m each 
of which finitely additive extensions of Lebesgue measure, Pi and P2, 
are defined The generic set B in this example is therefore partitioned 
into Bi = B f) Si and B2 — B f) 82^ respectively For this example, 
let B <• C, if, and only if Pi(Pi) < Pi(Ci), or else Pi(Pi) == Pi(Ci), 
and P2(P2) P2(C^2) This <• is not fine, because, for example, S 

cannot be partitioned into events none of which is more probable than 
82- On the other hand, it is easily seen to be tight 

Next, take 8 to be the union of Si and 82 with the measures of Pi 
and P2 as defined in the preceding example, but modify the definition 
of <•, saying B <• C, if and only if Pi(Pi) + P2(B2) < Pi(Oi) + 
P2(C2), or else Pi(Bi) + P2(B2) = Pi(Ci) + PsC^s), and Pi{Bi) < 
Pi(Oi) This is an example of a qualitative probability that is fine but 
not tight 

Combining the ideas of the two preceding examples, it is easy to ex- 
hibit a qualitative probability that is neither fine nor tight but is such 
that 8 can be divided into arbitrarily many equally probable events 
Thus all the questions raised m the opening paragraph of this section 
are answered in the affirmative 
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To get a feeling for the question whether literally all sets should be 
regaf’ded as measurable, suppose that 5 is a cube of unit volume and 
that tne probability measure P that strictly agiccs with < is such that 
the probability of a parallelepiped is equal to its volume. Tt follows 
that the probability of any set having Jordan content is its Jordan 
content, but, if a set has not Jordan content, a continuum ot‘ p()ssi!)ih- 
ties IS still open Though othei possibilities are concenvabk^, it is not 
unnatural to consider an idealized person for whom the numeri(‘a! prol)- 
ability attached to each Box el set, or even each Lebesgue measurjible 
set, IS its Lebesgue measure To go fuither and take seriously (axinpari- 
sons between sets that aie not Lebesgue measiuable, or even between 
those that are not Borel measurable, seems to me to be witliout any 
implication bearing on reality I suppose it might be argued, on the 
contrary, that there is no feature of reality that can properly be^ntet- 
preted by postulating that the person is abte to compate only sets from 
a sufficiently nanow field, so that it is simpler and morii (‘legant io ad- 
mit all sets The question seems to be one ol (»a.sb^, bui/ (Ik^ following 
remark illustrates what I consider an awkwaidiusss in sup[)osing proba,- 
bility to be attached to all sets ft would seian, at Hist gla.n<‘t% i,lmt Uk', 
person should be able, il he is so constitul.ed, to iega,nl ah pairs ot geo- 
metrically conguient sets for whicli he makes any comparison a(. all as 
equivalent, but the famous Banach-J\'uski })arado\' [Bfil shows thaX 
this cannot be done il all sots arc regaidcd a,s m(\‘isurabh‘ I think it a. 
little more graceful to abstain fiom comparison bi'twiMsi tlu' more bi- 
zaire sets than to give up, oi even much modilv, my mmyday notions 
about the symmetry ot such piobability probhans assoi'ial.i'd with 
geometry 

If one IS unwilling to insist on comparison bctwiaai many pair of 
sets, 01 events, then, in the same spint, it is inappropriate' to insist on 
comparison between every pair of acts All thal» ha.s Ixa'ii, or is to b(', 
formally deduced in this book concermng {iichaeana's among sets, (*ouhl 
be modifieel, mutatis mutandis, so that tlu' (*lass of (wenis would not 
be the class ol all subsets ot 8^ but rather a Borel field, that is, a. c-aige- 
bra, on aV, the set of all consetiuences would Ix' a measurable space', 
that is, a set with a particular or-algebia singled out, and an act would 
be a measurable function fiom the measurable spacic ol events to the 
measmable space of consequences Indeed, the whole thing could be 
done for abstract cr-algebras without letorcnetc to sets at all, and this 
might have some actual advantage, since it would make possible the 
identification of events with piopositions m almost any foimal language, 
even one unable to formulate at all the complete ih^st'ripirons I call 
states. 
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It may seem peculiar to insist on (r-algebras as opposed to finitely 
additive algebras even in a context where finitely additive measures are 
the central object, but countable unions do seem to be essential to some 
of the theorems of § 3 — for example, the terminal conclusions of Theo- 
rem 3 2 and Part 5 of Theorem 3 3 

So much of the modern mathematical theory of probability depends 
on the assumption that the probability measures at hand aie countably 
additive that one is strongly tempted to assume countable additivity, 
or its logical equivalent, as a postulate to be adjoined to PI"6 But I 
am inclined to agree with de Finetti [D2], [D4] and Koopman [K9], 
[KIO], [Kll] that, however convenient countable additivity may be, 
it, like any other assumption, ought not be listed among the postulates 
for a concept of personal probability unless we actually feel that its 
violation deserves to be called inconsistent or unreasonable I know of 
no argument leading to th§ lequirement of countable additivity, and 
many of us have a strong intuitive tendency to regard as natural proba- 
bility problems about the necessarily only finitely additive uniform den- 
sities on the integers, on the line, and on the plane It theiefore seems 
better not to assume countable additivity outright as a postulate, but 
to recognize it as a special hypothesis yielding, where applicable, a large 
class of useful theorems 

6 Conditional probability, qualitative and quantitative 

Conditional preferences among acts m the light of a given event weie 
introduced m § 2 7 Since the relation < among events has been de- 
fined in terms of the corresponding relation among acts, we may well 
expect to attach meaning to statements of the foim B < C given Z), 
provided that D is not null The natural way to do so is to take a pair 
of acts f and g that test whether JS < C (as pi escribed by the definition 
of < between acts in § 2) and say that B < C given D, if and only if 
f < g given D Since theie is moie than one pair of acts f, g by which 
the pioposition B < C can be tested, it is at first sight conceivable that 
not all such pairs would be in the same order given D, which would frus- 
trate the proposed definition of < given D. However, it is easily seen 
that for any f, g testing B < C, f < g given D (D not null) is equiva- 
lent toSnZ)<Cni) Thus it IS seen not only that the proposed 
definition is unambiguous, but also that it is expressible m terms of 
probability comparisons among sets, without direct reference to acts 
at all, and, still fuither, that the postulates Pl-6 apply to the condi- 
tional preference relation < given D among acts This preamble suffi- 
ciently motivates the following definition and easy theorem about quali- 
tative probability relations generally. 
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If < • IS a qualitative probability, and 0 < • D , then B <• C given 
D, if^nd only if B H D <• C D J) 

Theoeem 1 If <• IS a qualitative piobability, then so is <• given 
D If in addition < • is fine or tight, then < • given D is coirespondingly 
fine or tight 

If <• is fine, then, for any I) tliat is not null, theie exists, in view of 
Theorem 3 3, one and only one probability measure | B), the 
(conditional) probability of B given B, that almost agrees with <• 
But, 3ust as one would expect from the traditional study of numerical 
probability, and as may be easily venfied, P{B fl D)/P(D) considered 
as a function of B foi fixed B is a probability measuie that almost 
agrees <with < * given B Therefore, 

(1) P{B i B) - P(B n B)/P(B) 

As was explained in § 2 7, preference among acits given B exm sug- 
gestively 1)0 expr(\sse(l m temporal teims Analogously, the comparison 
among event, s given B and, thoiefoie, conditional proba-hility given H 
can be oxpiessod temporally. Thus 1 B) can be legarded as t,lie 
probability the poison would assign t,o C after lie liad observed that B 
obtains Tt, is (tonditional probability that gives expression m the theory 
of pcisonal piobalubty to the phenomenon of learning by experience 

In accordance with established usage, a pa,n of event.s B, (1 a, re cadled 
independent if B(B f] (7) = P(B)P{C) More generally, a, sid, of (wenliS 
are called independent, if for cveiy finite set, of them, say Bi, * • , B„, 

(2) /’(nv?j = iLw). 

Obviously, if B is not null, B and B arc mde^pendent-, if and only if 
P(B I B) ~ P(B), in which case B may fairly be (*ahed irrelevant to B. 

The notions oi mdeiiendence and u relevance have, so far as ! can 
see, no analogues in qualitative piobability; this is surprising a,nd un- 
fortunate, for these notions seem to evoke a strong intuitive response 
The absence of these analogues is tiaceablc to the absence of a quuht,a- 
tive analogue for propositions of the form P(B | C) < P{G | 77). Woik- 
ing under a rather different motivation from tiuit which guides this 
book, B 0 Koopman [K9], [KIOJ, and [Kll] has developed a system of 
qualitative possibility in which it is meaningful to compare B given C 
with Q given H, It is tiue also that for qualitative probability, even as 
it is defined here, some mterconditional comparisons might be natu- 
rally defined If, for example, B <• ^B given C and <• G given 
7J, it would not be unreasonable to establish the cjonvention that B 
given C <• G given II This sort of extension is not, however, highly 
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pertinent to my purpose, for here I have little interest in qualitative 
probabilities, except as a foundation for quantitative probability^ 

The following 'partition formula is well known and easy to prove 

(3) P(C) = EP(C|B,)P(5,) 


where is a partition of S into non-null sets If, further, C is not null, 
it IS also trivial to derive the celebrated Bayes’ rule (or theorem), '* 


(4) 


P(P. I C) 


n 


P{C I PQPCPQ 
P(C) 

P(C I PJP(P,) 
Ep(c1p,)P(p,) 




Illustrations of these formulas are found in all elementary texbooks on 
probability, as well as in later sections of this book 
Finally, if neither B nor C is null, 


P{B 1 C) _ P{C 1 B) _ P{B n C) 
P(B) P(C') “ P(B)P(C) ’ 


which may be given the suggestive reading Knowledge of C modifies 
the probability of B by the same factor by which knowledge of B modi- 
fies the probability of C 

The concept ol random variable enteis into almost any discussion of 
piobability Expeits are fairly well agreed on the following definition 
A random variable is a function x attaching a value x{s) in some set 
X to every 5 in a set >S on which a probability measure P is defined t 
Such an S together with the measure P is called a probability space. 

Real-valued random variables aie the most familiar, though in gen- 
eral the values X can be things of any sort If, for example, x and y, 
with values m X and Y, lespectively, aie random vaiiables on the 
same measure space, a new landom variable z = {x, y} is defined by 
setting z(s) — {:i(s), y(s)} The values of z are thus elements of what 
IS called X X F (read the cartesian product of X and Y), the set of 
oidered pairs with first element in X and second in Y The same sort 
of thing can be done, of course, for ordered n-tuples and also for infinite 
sequences of random variables 

t In many applications of the theory of piobability, not all subsets of or of X 
are consideied measurable It is then required as part of the definition of random 
variable that x be measurable, i e , that for every measurable Y cz X, the set of 
b's such that c(s) e Y be measurable 
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Two random variables x and y defined on the same measure space S 
are exiled (statistically) independent; if and only if, for every Xo Cl X 
and > 0 C 7, the two events (i.c , subsets of S) defined by the condi- 
tions x{s)eXo and ?/(s) e Fq, respectively, are independent 1 The 
extension of this definition from pairs to any number of random variables 
IS obvious 

6 TThe approach to certainty through experience 

In § 3, the theory of personal probability was, from the purely math- 
ematical point of view, reduced to that of probability measures, a sub- 
ject that has been elaborately studied, more or less explicitly, for cen- 
turies Any mathematical problem concerning personal probability is 
necessarily a problem concerning probability measures — the study of 
which is currently called by mathematicians mathematical probability 
— and conversely The particular outlool<!^ and interpretation implicit 

in a personalistic concept of probability leads, however, to problems 
that, though peifoctly meaningful for mathcma-tical probability, might 
not otherwise liavc been emphasized 'This section a,nd the succeeding 
one each linclly discuss one such prolilcm Tluise two probhans aio 
selected fiorn among many possibilities for the insight they piovidc' 
into the concept of peisonal probabilily 
Before studying these problems, it is necessary to be (‘.onversant with 
the mateual in Appendixes I and 2, which is used in the immediate 
sequel and often thioughout the lest of this book 
As was brought out in §5, the person learns by expi'rience The 
purpose of the present section is to explore with a moderate degrcii' of 
generality how lie typically becomes almost certain of the truth, when 
the amount of Ins cxpeiicnce increases mdefimtely. To be specific, 
suppose that the person is about to observe a large number of landom 
variables, all of which are independent given lor each wlicie the 
are a partition of S It is to be expected intuitively, and will soon 
be shown, that undei general conditions the poison is very sure that 
aftei making the obseivation he will attach a probability of neaily 1 to 
whichever clement of the paitition actually obf.ains 
To describe the situation foimally, let Bf, be a paitition of with 
P{Bt) = Let Xr, r = 1, 2, * *, be a seijuence ol random vaiiables, 
each taking on only a finite number of values (which can without loss 
of generality be thought of as integers) The restiiction to a finite set 
of values could be lenioved, but to do so would raise problems of mathe- 
matical technique that, however interesting, are lathci beside the point 

■j Wiieio iiol uU hoth uie measurable, A"o ami To must, of eouis(*, bo re(iuned to 
be measurable 
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of this book Let x denote the first n of the random variables x^ It is 
to be borne in mind that x depends on n, so, strictly speaking, it should 
be written x(n). The assumption that, given 5, the x^^s all hav^ the 
same distribution is expressed by 

( 1 ) PiXr{s) = Xr [ B,) = ^(Xr | t), 

where ^{xr | ^) is defined by the context Combining (1) with the as- 
sumption that the x^’s are independent given 5^, 

n 

(2) P(X I J5,) = Df P{x{s) = {Xi, , Xr] I fit) = n I 0, 

r=l 


where a conventional symbol has been used for equal by definition. 

These hypotheses having been laid down, it follows from Bayes^ rule 
and the partition formula (5 3) and (5 2), that ^ 


(3) 


I x) 


Pjx I fit)P(fit) 
P(x) 


n \ i) 


and 


P{r) 


(4) 


fi(o = EfiO) riK-i) 1 '>) 

% 7 


In connection with (3j, it may be observed m passing that, if the a prion 
probability, /3(^), ot is 0, then, no matter what value x is observed, 
the a posteriori probability of B^J P{B^ | a), is also 0 This is an ex- 
ample of the geneial principle that, if some event is legarded as vii- 
tually impossible, then no evidence whatsoever can lend it credibility 
Similaily, (3) implies the equally common-sense piinciple that, if an 
observation x is viitually impossible on the hypothesis (le, given) 
and x is observed, then Bi becomes virtually impossible a posteriori 
It IS paiticularly interesting to compare the probability of two ele- 
ments of the partition, say Bi and for definiteness, in the light of x 
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where self-explanatory abbreviations have been intioduced Equation 
(5) meaningless, if both the numerator and denominator of its left- 
hand side vanish If the denominator alone vanishes, the fraction may 
properly be regarded as infinite This will happen, if and only if 
null, and Bi is not null given x, Thai, is, it will happen if and only if 
/5(1) 0, (3(2) = 0, or if /?(!) 0, and R(x) = co 

In modern statistical usage, li'(iy) and B(t) aic the likelihood ratios 
of 7?i to B 2 given Xn and x, lespectively, (luantii.ies of importance in 
many theoretical contexts 

If a person contemplates making the observation x, that is, lindmg 
out the value of x(s) for the s that is the tiue state of the woild, it may 
properly be asked how probable he considers it that R will turn out to 
have a particular value It will be shown, bairing two banal excep- 
tions, that, for n sufficiently large, the piobability, given ^i, that li is 
gi eater than any preassigned number is ^almost 1 The possioility 
P(Bi) = 0 IS to be excepted, for then the conditional probability in 
question is meaningless Tlie other exception oceans when 1) — 
^(xr I 2) lor ovei 3 ^ 'ir, that is, when the common (hstrihui.ion of Xr given 
Bi IS the same as it is given B 2 I for then observai.ion of Xr is sim])ly 
irrelevant in distinguishing /b from Bo^ or, a little inoie technic.ally, x, 
is irrelevant to Bi given Bi U B 2 j a,nd 

(0) PiR'M = 1 1 /it) - 1. 

Formally, it is to be demonstrated that, unless J\Bx) — 0, 01 ((>) 
holds, 

(7) lim /^//(a) > p | iii) = 1 lor 0 < p < oo. 

7i - 00 

The problem is quite simple when account is taken of the fact thal. 
R{x) IS the product of ?i landoin variables, R/{Xr)j that are mdependent 
given Bi In attacking the problem, two cases a, re to be distinguished, 
according as thcie aic or arc not values of x that have positive pioba- 
bility given JSi but zero probability given B 2 
It IS in piactice rather foitunatc to find instances of the fiist case, 
for then (7) applies with a vengeance Indeed, suppose that 

( 8 ) PiR'M < <x> \ Bi) = (j>, (t) < i 

Then 

(9) P(i? = « 1 £,) = 1 - 
which obviously approaches 1 with increasing n 
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The second case, namely (t> = 1, m more interesting Since much is 
known about sums of identically distiibuted independent random varia- 
bles, it is natural to investigate 

(10) logRit) = J^logR'iar), 

r 

thereby replacing a pioduct by a sum It is easily seen from the defi- 
nition of R'{xj) that P{R'(xr) > 0 | Bi) = 1 , so, in the case now^at 
hand, the functions log R\xr) are independent real bounded random 
vai lables 
Letting 

( 11 ) I = E{\ogR'{Xr)\B{), 

the weak law of large numbers f implies that, for any e > 0 , ’ 

(12) ' hm P(log R{x) > nil - e) j 5i) = 1, 

n — > oo 

equivalently, 

(13) hm PiRix) > \ Bi) = 1 

M -4 00 

The objective will there! oie be achieved, if it is demonstrated that 
I > 0 unless ( 6 ) holds But 

( 14 ) I = E{\ogR'{xr)\Bi) 

> -\ogEiR’-\xr)\Bi) 

= - log 1 = 0 , 

as may be aigued thus The inequality in the above calculation is as- 
signed as Exeicise 8 in Appendix 2 , together with the fact that equality 
can hold in (14) if and only if R^~^(Xr) is constant wuth piobability 
one given Bi But the expected value of R'~^(xr) given J5i is equal to 
1, as (14) asseits and as may be easily veiified from the definition of 
i?'~^(Xr) So, bailing the exceptions provided foi, I > 0 , and the 
demonstiation of (7) is complete 

Before the observation, the probability that the probability given x 
of whichevei element of the partition actually obtains will be greater 
than a IS 

(15) E mP{P{B, 1 x) > a 1 BO, 

I 

where summation is confined to those for which /3(^) 7 ^ 0. Applica- 
tion of (14) (extended to arbitrary pairs of ^^s) shows that the coefificients 

t For the definition of this law, see, if necessary, p 191 of Feller’s book [FI] 
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of each /3(^) in the quantity (15), and theiefoie the quantity itself, ap- 
proaches 1 as n increases, provided only that no two functions ^(Xr | ^) 
and i(Xr I aie the same, if j3(t) and j8(z^) are both diffoicnt fiom zero 

To summaiize informally, it has now been shown that, with the ob- 
servation of an abundance ol lolevant data., (Jie poison is almost cer- 
tain to become highly convinced of the truth, and li. lias also been shown 
that he himself knows this to be the case 

It may be remaiked, foi those lamihai witJi (‘citain theoiems, that 
many lefinements ol (7) and its consequences could be woiked out. by 
application of the strong law of laige numbers, the central limit theo- 
rem, and the law of the iteiated logaiithm to R'{Xr) 

The quantity I is coming to be called the information ol the distri- 
bution of Xr given Bi with respect to the distiibution oi Xr given 
More geneially, if P and Q aie probability measuies, confined (fqi sim- 
plicity) to a finite set X with elements i.he information of P with 
respect to Q is defined by 

(>0 7 

, Q{i) 

This usage stems fioin woik of (T‘uide Shajmon in (‘ommunKaition ('n- 
gineenng, a good account of Avhicti is given m [SI IJ, a.nd also from inde- 
pendent work of Norbei t Wiener in a. lehited context [W lOj The idea.s 
of Shannon and of Wiener, though (‘onceined wiih probability, seem 
rather fai horn statistics I(. is, theu'lou', sonut.hing of an a,c(u<leut 
that the teim ^hnfoiination” coined by (hem should be nol, a,ltog(d.her 
mappiopnafn in statis(.ics Th(‘ si(.ua.liion is sl.ill luither (‘onlus(‘d, be- 
cause, as long a.go a,s 1925, H A Fisher (anphasizcnl a,n impor(-a.nt no- 
tion, whicli he called “inforinataon,” in (‘onneetjon wilh (in' l.heoiy of 
estimation (Pajier 11, Thror}f of slafi,shral c,s/,/n/n//en in [F()|) Al. fust 
glance, Fisher’s notion seems (juit.e dilteient liom iJiat ol vSha,nnou and 
Wienei, but, as a mattei of lac-t, Ins is a linnf/ing form ol ihcais A 
useful but lather l-echnical exposif.ion i elating tlu' si^vcaal senses of “in- 
foimatiou” IS given by Kullba(‘lv and Leil>!(‘r [1x15], and t return to the 
topic in § 1 5 f) 

7 Symmetric sequences of events 

A problem often posed by statisticians is to estimate fiom a sequence 
of observations the unknown probability p that repeated tnals of some 
sort are successful On an objectivistic view, this problem is natuial 
and important, for on such a view the probability that a (‘.oin falls heads, 
for example, is a property of the com that c.an bo deter mined by ex- 
penmentation with the com and in no other way. But on a pci’sonalistic 
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view of probability, strictly interpreted, no probability is unknown to 
the person concerned, or, at any rate, he can determine a probability 
only by interrogating himself, not by reference to the external world 

This situation has been interpreted to imply that the personalistic 
view IS wrong, or at any rate inadequate, because it apparently cannot 
even express one of the most natural and typical problems of statistics 
Thus far in this book, I have not argued against the possibility of de- 
fining some useful notion of objective probability, but have contented 
myself with presenting a particular notion of personal probability 
Therefore, at this point it might be tempting to seek a dualistic theory 
admitting both objective and personal probabilities in some kind of ar- 
ticulation with one another De Finetti [D3] has shown, however, 
that it IS not necessary to do so, that the notion of a com with unknown 
prob^ility p can be reinterpreted in terms of personal probability 
alone * 

The present section is devoted to outlining this development due to 
de Finetti In the organization of the book as a whole, it plays no logi- 
cally essential part, it is, rather, a digression intended to give a clearer 
understanding of the notion of personal probability, especially in rela- 
tion to objectivistic views The ideas presented here are but a frag- 
ment of those on the same subject in [D2] 

Let Xr be a sequence of random variables taking only the values 0 
and 1 The x,,’s are, to all intents and purposes, a sequence of events, 
the rth of which is the event that Xr{s) = 1 To say that these events 
are independent, each occurring with probability p, is to say that the 
probability of any finite pattern, Ti, • *, initiating the sequence 
.TrCs) is given by the formula 

(1) P(ar,(s) = Xr, r = I, ■ n I p) = p^(l - 

where y is the number of I’s among the XrS for r = 1, -,71 

Mixtures, in a certain sense, of sequences of random variables are 
often of interest, as they already have been in the pieceding section 
Suppose, to be explicit, that the woild is partitioned by Bj, and that, 
given Si, the XrS are independent with P{xr{s) = 1 | B^) having some 
fixed value p(i) Then the unconditional piobability of a particular 
initial sequence is a mixture of the probabilities given by (1) thus: 

(2) PiXris) = Xr,r = 1, ,n) = 

Z 

It IS natural to generalize (2) formally thus 

(3) Pi'J,(s) = r,, r = 1, ■ ,n)= “ pr-^dM{p), 
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where ilf is a probability measuie on the real numbers in the interval 

[ 0 , 

It IS noteworthy that equation (3), understood to apply foi eveiy n, 
is equivalent to the condition that the piobability that eveiy n of each 
pi escribed set of 7i of the XfS takes the value 1 is 

This follows by aiithmetic induction hom the obvious foi inula 

(5) P(a-,(s) = .-Cr; r = 1, ■ , n) 

= P(xr(s) = Xr,r = I, ■ • ,n; a-„+i(s) = 0) 

+ P(a;r(s) = a;,, r = 1, • , a„+i(.s) = 1),^ 

which applies to any sequence of random variables t.aking on only iJie 
values 0 and 1 

Ji3(luation (3) can very well have an inteiinetation in su(‘h U^irns thad 
the measure M is noti merely an a-bstract probainhty m(*a,surej bui, is 
actually a personal probability Thus, if p is a random va,nabl(j that 
is (for a j»iven person) distiibuted aciairding to 717, and, it for (‘a(‘h p 
tlie conditional distribution of the x,\s givnn p is indtqaauhait, with 
== I) = p, thou (3) obtains. Stii(‘tly speakinji,, tlu'- notion ol 
conditional probability as it ocems in the preceding scadxaua^ is used in 
a soimnvhat wuler sense than has been d(‘rnuMl m this book, lor the 
probability of any particular p will typically be jseio. At least for 
(iountably additive measures, tJie necessary extension of couditiomil 
probability and conditional expectation is presentcKl by Kolmojj^oroft in 
fK7], it IS a concept of the greatest value in advaiuanl mathematical 
statistics and in probability generally. 

However, in most contexts where objectivists speak of an unknown 
probability Pj there is, so far as an (exclusively persoiuilisthi view ol 
probability is concerned, no unknown paranuiter that (*an play the role 
ol p m (3). 

Examination of situations m which ^hmknown'’ piobability is ap- 
pealed to, whether justifiably or not, shows that, from the personalistic 
standpoint, they alv^ays lefer to symmetric sequences of events m the 
sense of the following definition. The sequence of random variables 
Xr, taking only the values 0 and 1, is a symmetnc f i^equence^ if and only 
if the probability that any 6 of the a:,(s)'s equal 1 and any c otlun’ 
av(*sO’H equal 0 depends only on the integers h and c 

I D(‘ Fui(‘tti uses th<‘ Fronch woul for “(><|iuval(‘nt. 
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It IS easy to verify that any mixture of independent sequences in the 
sense of (3) is a symmetric sequence De Finetti has discoveied ^hat 
the converse is also true These conclusions can be formally summarized 
thus 

Theorem 1 A sequence of landom variables Xr, taking only the 
values 0 and 1, is symmetric, if and only if theie exists a probability 
measure M on the interval [0, 1] such that the probability that any pie- 
scribed n of the :rr(s)’s equal 1 is given by (4) Two such measures, M 
and ]\r, must be essentially the same,t m the sense that, if B is a sub- 
interval of [0, 1], then M{B) = M\B) 

Consider mg that de Finetti has published a proof of Theorem. 1 in 
[D2] based on the Fourier integral, that any proof of it must be father 
techn»cal, and that the theoiem is not the basis of any formal inference 
later m this book, it seems best not to prove it here J 

It is Theoiem 1 that makes it possible to express propositions le- 
leriing to unknown probabilities in purely peisonalistic terms If, for 
example, a statistician were to say, do not know the p of this com, 
but I am sure it is at most one half,’’ that would mean in peisonalistic 
terms, “I regard the sequence of tosses of this coin as a symmetiic se- 
quence, the measure M of which assigns unit measure to the interval 
[0, |] ” This condition on M means in turn that for every n the (per- 
sonal) probability of n consecutive heads is at most as is easily 
verified I do not insist that propositions couched in terms of a ficti- 
tious unknown probability are bad, if understood as suggestive abbrevi- 
ations, hut onl 3 ^ that the meaningfulness of such propositions does not 
constitute an inadequacy of the personalistic view of probability 

The mathematical concept of probability measure or, a trifle more 
geneially, bounded measure is fundamental to mathematics generally 
Probability measures, often under other names, ai'e, therefore, em- 
ploved m many par i,s of pure and applied mathematics completely un- 
lelatcd to probability proper For example, the distribution of mass 
111 a not necessarily rigid body is expressed by a bounded measure that 
tells how much of the body is in each region of space We must, there- 
fore, not be surprised if, even in studying probability itself, we come 
across some probability measures used not to measuie probability 

t Technical note If “probability measure” were here understood to mean a count- 
ably additive piobability measure on the Borel sets of [0, 1], the theoiem would re- 
main true, and the essential uniqueness of M would become true uniqueness 

t Technical note Theorem 1 can be proved veiy quickly and naturally by apply- 
ing the theory of the llausdoitf moment problem (pp 8-9 of [S13]) to M, but this 
method does not seem to geneiahze readily 



54 


PERSONAL PROBABILITY 


[3 7 


proper but only for auxiliary purposes In the event that p is not ac- 
tually an unknown paiameter, the measuie M presented by Theorem 1 
seems at first sight to be such a purely auxiliary measure, but, as a mattei 
of fact, M does measuie ceitain interesting probabilities, at least ap- 
proximately. Foi example, letting 


(6X 

it can be shown that 



n 1 


r? 


(7) hm P{xr,(s) < 5) = M(p < d) 

71 — » 00 


In words, the person consideis the average of any large number of fu- 
tuie observations to be distributed appi oximately the way p is dis- 
tributed by M This is an extension of tlip oidinaiy weak law of laige 
numbeis, proved m [D2] along with a coriosponding extension of the 
strong law 

If the first )i teims of a symmetnc seipieiur aie obsc^rved, how does 
the rest of the sequence appear to (he person m the hgh(i of (his obser- 
vation? In the first place, it also is a symine(ji(‘ sequence l)u(- geneially 
of a structure different from (hat of the original sequen(‘e, a,s may be 
shown thus: Let 


(8) t(7/, n - y) = or = r, , r = I, , n), 

as one may for a s^TOmcti ic sequen(‘e llieii 

(9) PKCs) = a„; g = w, + 1, , « + di [ 7,(,s) = / = ], • • , n) 

- = -'p. = r • ,n. + m) 

P(.>r(.S') =.(„/ = 1, • -, 11 ) 

- + "> ~ y) + ~ 

t(v, n - y) 

where z is the number of Ts among the .Ti/s, <7 ==?/.+ 1, , n. + m. 

Equation ( 9 ) shows that the sequence x^, g > /q given that — a,., 
r = 1, • n, IS a new symmetric sequence characterized by 


( 10 ) 


m — z) == Df 


x(y + Zy {n - y) + (m - z)) 

n - y) 


The measure i¥' associated with the new sequence is, according to 
Theoiem 1, essentially determined by the condition that 
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(11) 


0) 

_ Tr{m + J/, n - 2/) 

■^{y, n - y) 




= ip 


Triy, n-y) 

p2/(l _ p)n-v 

ir{y, n-y) 


dM{p) 


Equation (11) makes it plausible that, except for the slight ambiguity 
permitted by Theorem 1, M' is defined (for Borel sets B) by * 

(12) M'{B) = 7r-\v,n -y)f p^(l - p)”"^ dM{p), 

JjS 

and this can in fact be demonstiated with some appeal to slightly ad- 
vanced methods pei taming to the Hausdorff moment pioblem (pp 8-9 
of [S13]). 

It IS noteworthy that, if M(B) = 0, then = 0 also In the 

event that p really is an unknown parametei, this means that, if the 
person is viitually cei'tam that the true p is not in B, no amount of 
evidence can alter that opinion 

Equation (12) shows that Af' is generally diffeient from M Indeed, 
for fixed n > 1, Af' is clearly the same as M for every y for which 
7r{y, n — y) > 0, if and only if M assigns the measure 1 to some one 
value of p That is, the person regards evidence drawn from a sym- 
metric sefpience as ii relevant to the future behavior of the sequence, if 
and only it at the outset he legards the sequence not merely as sym- 
metiic but also as independent 

It can be shown that the person regards it as highly piobable that, 
if he observes a sufficiently long segment of a symmetric sequence, the 
continuation of the sequence will then be one for which the conditional 
vaiiance of p, ^ 

(13) Jp2dM'(p) - iJpdilf'(p)} , 


Will be small In the event that p is really an unknown parameter, this 
implies that the person is veiy sure that after a long sequence of obser- 
vations he will assign nearly unit piobability to the immediate neigh- 
borhood of the value of p that actually obtains — a parallel to the ap- 
pioach to ceitamty discussed in § 6 



CHAPTER 4 

c 

Critical Comments 
on Personal Probability 

1 Introduction 

f 

It IS niy tentative view that the concept of personal piobabilii^y in- 
troduced and illustiated in the preceding^ chapter is, except possibly 
foi slight modifications, the only probability concept essential to sci- 
ence and other activities that call upon probability 1 propose in this 
chapter to discuss the ahoitcomings I sec in that, particular p(M’sonal- 
istic view of probability, which, for bievity, shall iiere be (‘ailed simply 
“the peisonahstic view’\ to point out briefly th(‘ relationships beiwcHm 
it and other views; to ciiticize other views in t.lie light of it; and t.o dis- 
cuss the criticisms holdeis ot other views have raised, or may hv ex- 
pected to raise, against, it 

From the standpoint of stnet logical organij^at ion su(‘]i <*nt.i(‘a.l re- 
marks are somewhat premature, beiuiuse t.he pcasonahstic view itself 
insists that probability is con<‘.erned with consist(ait action in the face 
of uncertainty. Consequently, until the theory of such action has bc^en 
completely outlined m later chapters, the vunv to bti criticaziul (‘annot 
even be considered to have been wholly presented Piacdically, how- 
evei, it seems wise not to confine cntical comments to thei one pari, of 
the text that logic may suggest as appropriate, but. lather to ioii(‘h on 
criticism irom time to time, even at the cost of sonui repetition Thus, 
some of what is to be said hei e has already been said in the introductory 
cliaptci and elsewhere, and some of it will be said again 

Views other than the personalistic view are to be discussed liere, but 
it cannot be too distinctly emphasized that the account given of them 
will be very superficial t One function of discussing other views is to 
provide the reader with at least some orientation in the large and di- 
versified body of ideas pertaining to the foundation of statistics that 

t Much more extensive comparative mateiial rs given by K(wnea [K41, by Nag(4 
[Nl], and by Carnap [Cl] Koopman [K121 should also bi^ m(‘ntioned m this con- 
nection 
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have been accumulated A less obvious, but I think no less important 
and legitimate, function is to cast new light on the peisonalistic yiew, 
especially for those who already hold, or tend to hold, other views 

2 Some shortcomings of the personahstic view 

I can answer, to my own satisfaction, some criticisms of the personal- 
istic view that have been brought to my attention These points are 
discussed later in the chapter, but in this section I state and discuss 
as cleaily as I can those that I find more difficult and confusing to 
answer 

According to the personahstic view, the role of the mathematical 
theory of probability is to enable the person using it to detect incon- 
sistencies in his own real oi envisaged behavior It is also und^i stood 
that, having detected an mconsistency, he will remove it An incon- 
sist^cy IS typically removable in many different ways, among which 
the theory gives no guidance for choosing Silence on this point does 
not seem altogether appropriate, so there may be room to impiove the 
theoiy here Consider an example The peison finds on intenogating 
himself about the possible outcome of tossing a paiticulai com five 
times that he considers each of the thirty-two possibilities equally 
piobable, so each has for him the numerical probability 1/32 He also 
finds that he consideis it more probable that there will be four or five 
heads in the five tosses than that the first two tosses will both be heads 
Now, leference to the mathematical theory of probability soon shows 
the peison that, if the probability of each of the thirty-two possibilities 
IS 1/32, then the probability of four or five heads out of five is 6/32, 
and the piobability that the first two tosses will be heads is 8/32, so 
the person has caught himself in an inconsistency The theory does not 
tell him how to resolve the inconsistency, there are liteially an infinite 
number of possibilities among which he must choose 

In this paiticulai example, the choice that first comes to my mind, 
and I imagine to youis, is to hold fast to the position that all thirty-two 
possibilities are equally likely and to accept the implications of that 
position, including the implication that four or five heads out of five 
IS less piobable than two heads out of two I do not think that there is 
any justification for that choice implicit m the example as formally 
stated, but rather that in the sort of actual situation of which the ex- 
ample IS a crude schematization there generally are considerations not 
incorporated m the example that do justify, or at any rate elicit, the 
choice 

To approach the matter in a somewhat different way, there seem to 
be some probability relations about which we feel relatively “sure” as 
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copapared with others When our opinions, as lefleciod in leal or en- 
visaged action, are inconsistent, wo sacrifice the iinsuie opinions to the 
sure ones. The notion of “sine” and '^unsure’’ introduced heio is va.gue, 
and my complaint is pieeisely tliat neithoi the theoi y of persona,! proba,- 
bility, as it is developed in this book, noi any other device known to me 
renders the notion loss vague Tlieie is some banpi^ataon l,o nitioduce 
probabilities of a second older so that the poison would find himscli 
saying such things as 'dhe piobability that B is more probable than C 
IS greater than the piobability that F is more probable than G But 
such a program seems to meet insui mountable difFicultics 

The first of these — pointed out to me by Max Woodbuiy — is this 
If the primary probability of an event B weie a landom vaiiable b 
with respect to secondaiy probability, then B would have a ^ 'composite'^ 
probability, by which I mean the (secondary) expectation of b pom- 
posite piobability would then play the aMegodly villainous role that- 
secondary probabihly was intended to obviate, and noi, lung would have 
been accomplished 

Again, once second onka* probabilities are inl,roduc(Ml, i,h(' introduc- 
tion of an endless hierarchy seems inesca])ablc SucJi a, hierarcdiy S(‘ems 
vciy difficult to mteiprei,, a.nd it seems a,i, best, (,o ma,ke i,}ie i,heory less 
realistic, noi more 

Finally, the objection coiuanning composite' probabiliiiy would seem 
to apply, even if a,n endless hierarchy ot lugliei order [)roba,biliiu's w^eie 
introduced. The composiln proba, bihiy of B would here b(' i,he limit 
of a sequence ot niiinbeis, F 2 {B\{B))* )), ti Imul, that 

could scaicely be postuhited not to exist m a,ny mterpr(‘i,able i.lu'oiy of 
this soit The uaulei may wish i,o eva,lua,t<' toi himself i,he a,rgumeni,s 
m favor of such a hiera,ichy put, iorwaul by henba(‘h (Chapter 8, 
[R2]), taking piopei a,(‘(‘,oimt of the dilTcK'iH'es Ix'twc^en Ih‘ich('nba(‘h’s 
overall view, and his nuithemahical theory, ot probaliility on one hand 
and, on the othei, the peisonalist,ic, view and measure4heoi('ti(* mathe- 
matical theory that, are the basis of my eritu|ue of higher order jiroba- 
bilities 

The interplay between the ‘tsurc’^ and “unsure” is mteiest,ingly ex- 
pressed by de Finetti (p 00, [1)2]) thus “The fact t,hat a direct, estimate 
of a probability is not always possible is just the reason that the logi- 
cal rules of piobability aie useful The piactical object of these lules 
IS simply to reduce an evaluation, scarcely accessible diiectly, to others 
by means of winch the determination is rendered easiei and more 
precise ” 

It may be clarifying, especially for some readers undei the sway of 
the objectivistic tradition, to mention that, if a person is “suie” that 
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the probability of heads on the first toss of a certain penny is it dges 
not at all follow that he considers the coin fair He mightj to take an 
extreme example, be convinced that the penny is a trick one that al- 
ways falls heads or always falls tails 

Logic, to which the theory of personal probability can be closely par- 
alleled, IS similarly incomplete Thus, if my beliefs are inconsistent 
with each other, logic insists that I amend them, without telling me how 
to do so This IS not a derogatory criticism of logic but simply a part 
of the truism that logic alone is not a complete guide to life Since the 
theory of personal probability is more complete than logic in some re- 
spects, it may be somewhat disappointing to find that it represents no 
improvement in the particular direction now m question 

A second difficulty, perhaps closely associated with the fir«t one, 
stenq^ from the vagueness associated with judgments of the magnitude 
of personal probability The postulates of personal probability imply 
that I can determine, to any degree of accuracy whatsoever, the proba- 
bility (for me) that the next piesident will be a Democrat Now, it is 
manifest that I cannot really determine that number with great accu- 
racy, but only roughly Since, as is widely recognized, all the interest- 
ing and useful theories of modern science, for example, geometry, rela- 
tivity, quantum mechanics, Mendelism, and the theoiy of perfect com- 
petition, are inexact, it may not at first sight seem disquieting that the 
theory of personal probability should also be somewhat inexact As 
will immediately be explained, however, the theory of personal proba- 
bility cannot safely be compared with ordinary scientific theories in 
this respect 

I am not familiar with any seiious analysis of the notion that a theory 
IS only slightly inexact or is almost tiue, though philosopheis of science 
have perhaps presented some Even if valid analyses of the notion 
have been made, oi are made in the future, for the oidmary theones of 
science, it is not to be expected that those analyses will be immediately 
applicable to the theoiy of personal probability, normatively inter- 
preted, because that theory is a code of consistency for the person ap- 
plying it, not a system of piedictions about the world around him 
The difficulty experienced in § 2 6 with defining indifference seems 
closely associated with the difficulty about vagueness raised here 
Another difficulty with the theory of personal probability (or, more 
properly, with that larger theory of the behavior of a person in the 
face of uncertainty, of which the theory of personal probability is a 
part) IS that the statement of the theory is not yet necessarily complete 
Thus we shall in the next chapter come upon another pioposition that 
demands acceptance as a postulate, and, since even this leaves the per- 
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sQn a great deal of fi eedom, there is no telling when someone will come 
upon still another postulate that clamors to be adjoined to the others 
Strictly speaking, this is not so much an objection to the theory as a 
warning about what to expect of its futuic development 

3 Connection with other views 

All views of piobability aie rather intimately coinuuded with one an- 
otliei For example, any necessary view can be i egai (led as an extreme 
personalistic view m which so many criteria ol consistency have bc^en 
invoked that there is no role left for the pei son’s individual judgment 
Again, objectivistic views can be regarded as personalistic views ac- 
cording to which comparisons of probability can be made only lor very 
special pairs of events, and then only accoidmg to such ciiteria that all 
(right-minded) people agiee m their compaiisons, ^ 

From a dif{cient standpoint, personalistm views lie not, between, but 
beside, nc(*essary and objectivistic views, toi both nec.essaiy and objec- 
tivistic views may, in contrast to perscmalist.u; \4ews, b(‘ (*a.lled objective 
in that tliiw do not (mn(‘ern individual judgnaiit 

4 Criticism of other views 

It will tJii’ow some light on the peisonalLsi.ic view t,o siiy brudly how 
some othei views seem to (*om})are unfavorably with it 

It is one of my fundamental tenets that any sat isfactoiy aciamnt of 
piobability must d('al with the problem of a, (‘turn in tlu' ia.ee of un(*er- 
tamty. Indexed, almost eveiyono who senously (‘onsuhas piobability, 
especially if ho has practical cxpeiience with statistics, doc\s sooner or 
later deal with that problem, though often only t,a(4tty hlviai some 
personalistic views seem to me too remote from i.lu^ problem oi a,(*t.ion, 
or decision For example, de Finetti m [D2] gives two approaches to 
personal probability Of these, one is almost exactly lik(^ tht^ view 
sponsored here, except only that the notion “mou‘ piobable than” is 
supposed to be intuitively evident to the person, without, lelerenee to 
any problem of decision The other is more sa.tisfac4,oiy m this re- 
spect, being couched in terms of betting behavior, but it seems to me 
a somewhat less satisfactory appi()a(‘h than the one sponsored heie, be- 
cause it must assume eithei that the bets are lor inhniLesimal sums or — 
anticipating the language of the next chaptei— that the utility of money 
IS linear The theoiy expressed by Koopman m [Iv9], [KIO], and [Kll] 
and that expressed by Good in [02] are both personalistic* views t,hat 
tend to Ignore decision, oi at any rate keep it out of the foregiound; 
but the personalistic view expressed by Ramsey in [HI], like the one 
sponsored here, takes decision as fundamental If any necessary view 
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can be formulated at all, it might well be possible to formulate it, in 
terms of decision, but, so far as I know, the notion of decision hai?^ not 
appeared fundamental to the holders of any necessary view It seems 
fair to say that objectivistic views, by their very nature, must in prin- 
ciple regard decision as secondary to piobability, if relevant at all 
Yet, the objectivist A Wald has done moie than anyone else to popu- 
larize the notion of decision 

As has already been indicated, from the position of the personalistic 
view, there is no fundamental objection to the possibility of construct- 
ing a necessary view, but it is my impression that that possibility has 
not yet been realized, and, though unable to verbalize reasons, I con- 
jecture that the possibility is not real Two of the most prominent en- 
thusiasts of necessaiy views are Keynes, represented by [K4], a^d Car- 
nap,^who has begun in [Cl] to state what he hopes will prove a satis- 
factoiy necessary (or nearly necessary) view of probability. Keynes 
indicated in the closing pages of [K4] that he was not fully satisfied 
that he had solved his pioblem and even suggested that some element 
of objectivistic views might have to be accepted to achieve a satisfac- 
toiy theory, and Carnap regards [Cl] as only a step toward the estab- 
lishment of a satisfactory necessary view, m the existence of which he 
declaies confidence That these men express any doubt at all about the 
possibility of nan owing a personalistic view to the point wheie it be- 
comes a necessaiy one, after such extensive and careful labor diiected 
toward proving this possibility, speaks loudly for their integrity, at the 
same time it indicates that the task they have set themselves, if possi- 
ble at all, is not a light one 

Keynes, writing in 1921 of what are here called objectivistic views, 
complained, “The absence of a recent exposition of the logical basis of 
the frequency theoiy by any of its adherents has been a great disadvan- 
tage to me in criticizing it ’’ (Chap. VIII, Sec. 17, of [K4]) I believe 
that his complaint applies as aptly to my position today as to his then, 
though I cannot pietend to have combed the intervening liteiatuie 
with anything like the thoioughness Keynes himself would have em- 
ployed Reichenbach, to be sure, presents in great detail an interest- 
ing view that must be classified as objectivistic [R2], but it seems fai 
removed from those that dominate modern statistical theory and form 
the mam subject of the following discussion Whatever objectivistic 
views may be, they seem, to holders of necessary and personalistic 
views alike, subject to two major lines of criticism In the first place, 
objectivistic views typically attach probability only to very special 
events Thus, on no ordinary objectivistic view would it be meaning- 
ful, let alone true, to say that on the basis of the available evidence it 
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IS .very improbable, though not impossible, that France will become a 
mortal chy within the next decade Many who hold objectivistic views 
admit that such everyday statements may have a meaning, but they 
insist, depending on the extremity of their positions, tliat that meaning 
IS not relevant to mathematical concepts of piobability or even to sci- 
ence generally The peisonalistic view claims, however, to analyze 
such statements m teims of mathematical piobability, and it considers 
them important in science and othei human activities 

Secondly, objectivistic views are, and I think fairly, chaiged with 
circularity. They are generally piedicated on the existence in nature 
of processes that may, to a sufficient degree of appioximation, be rep- 
resented by a purely mathematical object, namely an infinite sequence 
of independent events This idealization is said, by the objectivists 
who lely on it, to be analogous to the treatment of the vague an^ ex- 
tended mark of a carpenter’s pencil as a geometrical point, which is so 
fruitful in certain contexts When it is pointed out to the objecdavist 
that he uses the veiy theory of piobability in dciiCi mining the (juality 
ol the approximation to which he lefeis, he rei<orts tliat iiic aipplied 
geometer — a fictitious chaiacter whose leputation for solidity in scacnce 
IS unquestioned — likewise uses geometry in deteimining the cpiahty ot 
his appioximations Let the geometer then be challenged, and he re- 
plies with a threefold reference to expeiicnce, saying, ^Tt is a common 
expeiiencc that with sufficient experience one develops good judgment 
in the use of geometiy and thcncefoitii gencially exponen(‘es success in 
the predictions he bases on it ” ^^Now,” says the objcctivist, ^The 

geometer’s answei is my answer ” But it seems t.o crituvs of objectivistic 
views that, though the geometer may be entitled to make as many allu- 
sions to expeiience as he pleases, the piobabilist is not free to do so, 
piecisely because it is the business of the probabihst to analyze the con- 
cept of experience. He, there! oie, cannot properly suppoit his position 
by alluding to experience until he has analyzed that concept, though 
he can, of course, allude to as many experiences as he wishes. 

Two sorts of mixed views call for special comment here 
First, some (among them Carnap [Cl]; Koopman [K9], [KIO], and 
[Kllj; and Nagel [Nl]) hold that two piobability concepts play a role 
in inference, an objectivistic one and a peisonalistic or a necessaiy one 
This dualism is typically justified as necessaiy to the analysis of such 
a concept as that of a coin with unknown probability of falling heads 
But, as § 3.7 explains, de Finetti has provided a satisfactory analysis 
on the basis of personal probability alone 
Second, others — ^for example, van Danzig [VI] and F6raud [F2] — 
finding the conventional objectivistic views circular foi the reasons I 
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have cited, try to break the circle by relatively isolated use of subjec- 
tive ideas Very crudely, it seems to be their position that in any one 
context it IS allowable for a person to act as though some one event of 
sufficiently small (objective) probability, chosen at his discretion, were 
impossible Quite apart from the relatively technical question of 
whether any consistent mixed view of this kind can be constructed, 
holders of personahstic and necessary views alike criticize them as un- 
necessarily timid, for they embrace subjective ideas, but only ging&ly. 

6 The role of S5mmietry in probability 

An important and highly controversial question in the foundations 
of probability is whether and, if so, how symmetry considerations can 
determine the probabilities of at least some events 

S;^mmetry consideiations have always been important in the study 
ol probability Indeed, early work m probability was dominated by 
the notion of symmetry, for it was usually either concerned with, or di- 
rectly inspired by, symmetrical gambling appaiatus such as dice or 
cards To illustrate those classical problems, suppose that a gambler is 
offered scveial bets concerning the possible outcome of rolling three 
dice, where it is to be undei stood that refraining from any bets at all 
may be among the available ^ffiets "Which of the available bets 
should the gamblei choose*^ Perhaps I distort history somewhat in in- 
sisting that eaily ptoblems were framed in terms of choice among bets, 
for many, il not most, of them were framed in terms of equity, that is, 
they asked which of two players, if eithei, would have the advantage 
111 a hypothetical bet But, especially from the point of view of the 
earlier probabilists, such a question of equity is tantamount to a ques- 
tion of choice among bets, for to ask which of two “equal’’ betters has 
the advantage is to ask which of them has the preferable alternative, 
as was pointed out quite explicitly by D Bernoulli in [BIO] 

In ehect, the classical workers recommended the following solution 
to the problem of three dice, with corresponding solutions to other 
gambling problems * 

1 Attach equal mathematical probabilities to each of the 216 ( = 6^) 
possible outcomes of i oiling the three dice (There are 6^ possibilities, 
because the first, second, and third dice can each show any of six scores, 
all combinations being possible ) 

2 Under the mathematical piobabihty established in Step 1, com- 
pute the expected winnings (possibly negative) of the gambler for each 
available bet 

3 Choose a bet that has the laigest expected winnings among those 
available 
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At present it is appropiiate to refrain from ciiticisms of the use 
made of expected winnings until the next chapter and to concentrate 
discussion on the notion that the 216 possibilities should be considered 
equally probable, which can conveniently be done by drastically reduc- 
ing the class of bets considered to be available Say, for definiteness, 
that the only bets to be consideied are simply even-money bets of one 
dollar, that the triple of scores falls m a picassigned subset of the 210 
possibilities When attention is focused on this restricted class of bets, 
the total recommendation is seen to imply that the probability measure 
defined in the hrst step of the recommendation be adopted as the per- 
sonal piobabihty of the gambler To put it differently, a gambler who 
adopts the recommendation will hold the 216 possible outcomes equally 
piobab^e not only m some abstract sense, but also in the sense of per- 
sonal piobabihty as defined in § 3 2. ^ 

The notion that the 216 possibilities should be regarded as equally 
probable is familiar to evoiyone, for it is taken for granted whercvei 
gentlemen gamble as well as m the standard high-school algebra courses, 
wheie it selves to illustrate the theoiy of combinations and permutai ions 

Traditionally, the equality of the probabilities was supposed to be 
established hy what was called the principle of insufficient reason, i 
thus Suppose that theie is an aigument leading to the conelusion that- 
one ot the possible combinations of ordered scores, say {1, 2, 3}, is 
moie probable than some other, say {G, 3, 4} Then the information 
on which that hypothetical aigument is based has such symmetry as 
to peimit a completely parallel, and therefore equally valid, argument 
leading to the conclusion that {6, 3, 4} is more probable than {], 2, 3} 
Therefore, it was asseited, the probabilities of all combinations must 
be equal 

The principle of insufficient reason has been and, I think, will con- 
tinue to be a most fertile idea in the theory of probability; but it is not 
so simple as it may appear at first sight, and criticism has frequently 
and justly been brought against it Holders of necessary views typi- 
cally attempt to put the principle on a rigorous basis by modifying it 
in such a way as to take account of such criticism. Holders of personal- 
istic and objectivistic views typically regard the criticism as not alto- 
gether refutable, so they do not attempt to establish a formal postulate 
corresponding to the principle but content themselves — as I shall here 
— with exhibiting an element of truth m it 

One ot the fh'st cnticisms is that the principle is not strictly applicable 
for a person who has had any experience with the apparatus m ques- 

t Perhaps what I heie call tht‘ pnnciple of insiifhcient leason should be called the 
piiimple of c*og(ait i(‘ason Sev Section 3 of [B15| foi the distinction involved 
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tion, or even with similar apparatus Thus, attempts to use the prin- 
ciple, as I have stated it, to prove that there is no such thing as s^’run 
of luck at dice, as actually played, are invalid The person may have 
had relevant experience, directly or vicariously, not only with gambling 
apparatus itself, but also with people who make and handle it, including 
cheaters 

It IS not always obvious what the symmetry of the information is in 
a situation in which one wishes to invoke the principle of insufficient 
reason For example, d^ Alembert, an otherwise great eighteenth-cen- 
tury mathematician, is supposed to have argued seriously that the prob- 
ability of obtaining at least one head m two tosses of a fair coin is 2/3 
rather than 3/ 4 (Cf [T3], Art 464 ) Heads, as he said, might appear 
on the first toss, or, failing that, it might appear on the second, or, 
finaljy, might not appear on either D'Alembert considered the three 
possibilities equally likely 

It seems reasonable to suppose that, if the principle of insufficient 
reason were formulated and applied with sufficient care, the conclusion 
of d'Alembeit would appear simply as a mistake Theie are, however, 
moi e serious examples Suppose, to take a famous one, that it is known 
ot an uin only that it contains eithei two white balls, two black balls, 
01 a white ball and a black ball The principle of insufficient reason has 
been invoked to conclude that the three possibilities are equally proba- 
ble, so that m particular the probability of one white and one black 
ball is concluded to be 1/3 But the principle has also been applied to 
conclude that there are four equally probable possibilities, namely, that 
the first ball is white and the second also, that the first is white and the 
second black, etc On that basis, the probability of one white and one 
black ball is, of course, 1/2 Personally, I do not tiy to arbitrate be- 
tween the two conclusions but consider that the existence of the pair 
of them rciflects doubt on the notion that a person's knowledge relevant 
to any matter admits any full and precise description in terms of 
propositions ho knows to be true and others about which he knows 
nothing 

Most holdeis of personalistic views do not find the pimciple of in- 
sufficient reason compelling, because they envisage the possibility that 
a person may consider one event more probable than another without 
having any compelling argument for his attitude Viewed piactically, 
this position IS closely associated with the first criticism of the principle 
of insufficient reason, for the holder of a peisonalistic view typically 
supposes that the peison is under the influence of experience, and pos- 
sibly even biologically determined inheutance, that expresses itself in 
his opinions, though not necessarily through compelling argument 
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Holders of personahstic views do see some truth m the principle of 
msijfficient leason, because they recognize that there aie fiequently pai- 
titions of the world, associated with symmetrical-looking gambling ap- 
paratus and the like, that many and diveise people all consider (veiy 
nearly) imifoim paititions As was illustiated m the preceding sec- 
tion, we often feel more “sine” about probabilities deiivcd fiom the 
judgment that such partitions are unilorm than we do about otheis 
Such partitions aie, moreover, very important in that they provide 
some events the probability of which to diverse people is m agreement 
Though the events concerned are often of no impoitance in themselves, 
agreement about them can, thiough the statistical invention of ran- 
domization, contribute to agreement about all soits of issues open to 
empiiigal investigation Widespiead though the agieement about the 
near uniformity of some partitions is, holdeis ol personahstic vjews 
typically do not find the contexts in which such agieement obtains 
sufficiently definable to admit of exiii cssion in a postulate. 

Holders of purely objectivistic views see no sense at. all in the original 
formulation of the principle of insuflicieni. leason, for it. uses “proba- 
bility” 111 a manner they considei in(‘amngless Hut t.luy too see an 
element of tiiith m the principle, whuti they (*,onsi(ler to be established 
as a pait of empirical jibysies Thus, foi e\anipl(\ tluw rega,rd it as an 
expeiimental fact, admitl.ing some (‘\pIa.nation in (.eims of theoretical 
physics, that three dice manufa(*t,uied with K'asonabh^ syinnKt.iy will 
exhibit each of the 21(1 possible patterns witli nearly ecuuil ircHiueney, 
if repeatedly rolled with suthcicnt violence on a suitable suifa.ee. 

Holders of pcrsonalistic views agree that experiments oi, mou^ gen- 
erally, expenences deteimme to a large extent when ])eople (miploy the 
idea of msufficient reason Thus, though cxperim(mt.s with gambling 
apparatus, quite apait from gambling itself, have a fa,scinal.ion that 
perhaps exceeds then real mteiest, such experiments au‘ not alt.ogethci 
worthless On the one hand, they provide stiong evulenc.e that a pei- 
son cannot expect to maintain a symmet-rical att.it.udc' towaid any piece 
of apparatus with which he has had long ex])eiiencc, unless he is vii- 
tually convinced at the outset that the possible states of the appaiatus 
aie equally probable and independent from trial to trial To say it m 
the more familiar and sometimes more congenial language of objective 
probability, long experiments with coins, dice, cards, and the like have 
always shown some bias, and often some dependence fiom tiial to tnal 
On the other hand (and this has the utmost piactical importance), it 
has been shown that, with skill and experience, gambling appaiatus, oi 
its statistical equivalent, can be manufactured in which the bias and 
the dependence from trial to trial are extremely small Tins implies 
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that groups of very diverse people can be brought to agree that repeated 
trials with ceitain apparatus are nearly uniform and nearly independent 
Thus certain methods of obtaining random numbers and other outcomes 
of unifoim and independent tiials, which are vital to many sorts of 
experimentation, have justifiably found acceptance with the scientific 
public A stimulating account of piactical methods of obtaining ran- 
dom numbeis, and random samples generally, is given by Kendall m 
Chaptei 8 (Vol I) of [K2] 

6 How can science use a personalistic view of probability? 

It is often argued by holders of necessary and objectivistic views alike 
that that ill-defined activity known as science or scientific method con- 
sists largely, if not exclusively, in finding out what is probably true, 
by i^nteria on which all reasonable men agree The theory of proba- 
bility relevant to science, they theiefore aigue, ought to be a codifica- 
tion of univei sally acceptable ciiteiia Holders of necessary views say 
that, just as theic is no loom foi dispute as to whether one proposition 
IS logically implied by othcis, there can be no dispute as to the extent 
to which one pioposition is paitially implied by otheis that aie thought 
of as evidence bearing on it, for the exponents of necessaiy views re- 
gard probability as a geneialization of implication Holdeis of objec- 
tivistic views say that, after appropriate obseivations, two reasonable 
people can no moie disagiee about the probability with which tiials 
m a sequence of com tosses are heads than they can disagiee about the 
length of a stick after measuring it by suitable methods, for they con- 
sidei probability an objective propeity of certain physical systems in 
the same sense that length is geneially considered an objective property 
of othei physical systems, small eirors of measurement being contem- 
plated in both context, s Neither the necessaiy nor the objectivistic 
outlook leaves any room foi personal differences, both, therefoie, look 
on any personalistic view of piobabihty as, at best, an attempt to pre- 
dict some of the behavior of abnormal, or at any rate unscientific, 
people 

I would leply that the personalistic view incorporates all the univer- 
sally acceptable ciiteiia for reasonableness in judgment known to me 
and that, when any ciiteiia that may have been overlooked are brought 
forward, they will be welcomed into the personalistic view The cri- 
teria mcoiporated in the personalistic view do not guarantee agreement 
on all questions among all honest and freely communicating people, 
even m piinciple That incompleteness, if one will call it such, does not 
disticss me, for I think that at least some of the disagreement we see 
around us is due neither to dishonesty, to errors m reasoning, nor to 
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f™tion m communication, though the harmful effects of the latter are 
almbst incapable of exaggeiation 

As was mentioned in connection with symmetry, there are partitions 
that diveise people all consider nearly unifoim, though not compelled 
to that agreement by any postulate of the theory of personal pioba- 
bility As has also been mentioned and as will be explained later (es- 
pecially in § 14 8), thiough the statistical invention of landomization, 
agreement about paititions pei taming to gambling apparatus of no im- 
portance in itself can be made to contiibute to agreement in eveiy 
part of empirical science 

Another mechanism that brings people having some, but not all, 
opinions in common into more complete agreement was illustrated in 
§§ 3 6-77 Indeed, it was there shown that in certain contexts any two 
opinions, piovided that neither is extieme in a technical sense, are al- 
most suie to be brought very close to one another by a sufficiently 
large body of evidence 

It has been countered, I believe, that, il experience systematically 
leads people with opinions oiiginally different to hold a c.ommon opinion, 
then that common opinion, and it only, is the propet subject of scien- 
tific probability theoiy Theie aie two inaccuracies m this argument 
In the first place, the conclusion of the persoiiahstic view is not that 
evidence brings holders of diHcient opinions to the same opinions, but 
rather to similar opinions In the second place, \i is typically true of 
any observational program, however (extensive but prescribed m ad- 
vance, that there exist pairs of opinions, neither of which can be called 
extreme m any precisely defined sense, but whi(‘Ji cannot be expected, 
either by their holders or any othei person, to be brought mio close 
agreement after the observational program 

I have, at least once, heard it objected against the personalis tic view 
of probability that, according to that view, two people might be of 
diffeient opinions, according as one is pessimistic, and the othei opti- 
mistic I am not sure what position 1 would take m abstract, discussion 
of whethei that alleged pioperiy of personalistic views would be ob- 
jectionable, but I think it is cleat from the formal definition of qualita- 
tive probability that the particulai peisonalistic view sponsored heie 
does not leave room for optimism and pessimism, however these traits 
be interpreted, to play any role in the personas judgment of probabilities 
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Utility 

1 Introduction 

The postulates P4-6, introduced m Chapter 3, have already led to 
simplification of the i elation < in so far as it applies to acts of a, special 
but^ important form Indeed, thiough the introduction of numerical 
probability, those special compansons have been i educed to oidinary 
aiithmetic comparison of numbers m such a way that many relations 
among a(‘ts aie dediuable by simple and systematic aiithmetic calcula- 
tion In iiliis chapter it will be shown that the aiithmetization of com- 
paiison among acts c-an, with the mtioduction of one mild new postu- 
late, be extended to viitually all pans ol acts 

This fai-iea(;hmg anthmetization of comparison among acts is 
achieved by atUu‘lung a number U{f) to each consequence / m such a 
way that f < g if and only li the expected value of ?7(f) is numerically 
less than or ecinal t.o tliati of U{g), provided only that the real-valued 
iunctions C7(f) and C(g) aie essentially bounded The provision can 
fail to be met only il there exist acts that are, so to speak, distinctly 
preferable to any fixed toward oi distirudly woise than any fixed punish- 
ment 

A tunction U that thus aiithmetizes the relation of piefeience among 
acts will be called a utility. It will be showm that the multiplicity of 
utilities IS not complicated, every utility being simply related to every 
other. I have chosen to use the name ‘^utility” m pieference to any 
other, m spite of some unlortunate connotations this name has in con- 
nection with economic tlieory, because it was adopted by von Neumann 
and Morgenstern when m [V4] they levived the concept to w’^hich it re- 
fers, m a most stimulating w ay Their treatment has been of such wide- 
spread interest that the introduction of a name other than ^^utility” at 
the present time would cause more confusion than it could alleviate. 

The next three sections are concerned with the technical exploration 
ot the utility concept, I think readers interested in the details will find 
it best to read these sections twice as a unit, m the fashion I have been 
recommending foi other material in which definitions and propositions 

G9 
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ar>^ interlarded with proofs; others will be content with a cursory read- 
ing,^ omitting proofs 

Taking advantage of the simplicity affoided by the introduction of 
utility, I tiy in § 5 to make some progiess with the pioblem, pointed 
out in § 2 5, of specifying criteria foi the constiuction of ^^small worlds 
Finally, § 6 biiefly leports the history of the utility idea A separate 
critical section is not necessary, because the criticisms of the theory of 
utility known to me are incorpoiated conveniently into the historical 
section 

2 Gambles 

Before discussing utility, it is expedient to establish certain facts, 
the first being that at least among a rather rich class of acts, namely 
acts confined with probability one to a finite number of consequepces, 
preference depends only on the probability distribution of the conse- 
quences of the acts 

Tiieorkm 1 

Hyp 1 /i; } In Jxre n elements of F, n > 1 

2 Pi, , Pn aic numbers such that 2pi = 1 

3 g and h are acts such that 

Piois) = /,) = Pih(s) i = 1 , , n. 

CoNCL g = h 

Proof The theorem is obvious for n = 1. It will be pioved by in- 
duction, supposing henceforth that n > 1 
Let B denote the intei section of the two events that g{s) == and 
h(s) 7 ^ fn, and let C denote the intersection of the two events that 
h{s) = fn and g(s) ^ fn It is easy to see that P{B) = P{C) C can 
be partitioned into Co, Ci, • • , C^-i, where Co is a null event and C^, 
^ = 1, • • , n — 1, IS the inteisection of C with the event that (j{h) = 

By repeated application of Conclusion 7 of Theorem 3 3 3, B (iau be 
partitioned into events Bq, Fi, • , B^-i such that = P{Ci), 

t = 0, *, n — 1. 

Let go = g, and define g^+i step by step tor ^ = 0, • • , n ~ 2 thus. 
(1) g,^iis)=fn forseC,+i, 

= A+x for s £ Bt^ij 

~ g^{s) elsewheie 

It IS easily seen from the facts of conditional probability that g^-|.x == 
gi given B^^i U C^+i, and it is even more obvious that gi^-i == g^ given 
^(B^^t U Ci^i) Therefore g*^x = g„ so gn^i ^ g Furthermore, 
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Pigz+iis) = fj) = Pigt{s) = /j) = pj, so P(gn-i(s) = fj) = p^, j =^, 

, n Thus gn—i IS not only equivalent to g but also satisfies the hy- 
pothesis of the theoiem relative to h, so it will suffice to prove the theo- 
rem for gn — 1 and h in place of g and h 
Now gn_i has been constructed to equal in C, except on a null set, 
Theiefoie gn-i = h given CUD, wheie D is the subset of on 
which gn-i = h = 

It remains only to show that gn-i = h given ^(C U D) If ^(C U’*D) 
IS null, that is true automatically, henceforth concentrate on the less 
trivial situation If ^(C U D) is not null, then < given U D) 
satisfies all the postulates assumed thus far, and therefore the conse- 
quences /i, , /n~i, the numbers = py(l - p,,), t = 1, • ,n-l, 

the acts gn-i and h, and the relation < given '^(C U D) satij^y the 
hypothesis of the theorem for a case m which it is supposed alieady to 
have been pioved ♦ 

In this chapter the notation '2pJ^ will denote the class of all acts f 
for which there exist paititions of s such that PiB^) = p^ and/(s) = 
foi p eB^, lleic (he //s aie a finite sequence of consequences (not 
necessarily distinct), and the piS a coriespondmg sequence of non- 
negative ical numbeih sucli that ^pi =1 In view of Conclusion 7 of 
Theoiem 3 3 3, sucli a class of acts, which will m this chapter be re- 
feu ed to as a gamble and dcnot,cd by f, g, fi, or the like, always has at 
least one clement- Tlieoiem I says, in effect, that the peison regaids 
all elements of a,ny gamble as eciiuvalent To put it ditfeiently, if the 
events ol a pail-ition have the probabilities p^, and if the act f is 
such that (-he consociuence h will befall the person m case B^ occurs, 
then the value of f is independent of how the paitition B^ is chosen. 

Gambles can be mixed, in a sense, to make new gambles, thus. Let 
fj be a finite seciuence ot gambles, 

(2) “ S Pijfiji 

and cTj a couesponding sequence of non-negative leal numbers such 
that '^cTj == 1 The mixture of the f^’s with weights <Tj, denoted is 
defined by 

(3) “ X/ 

~ X/ 

which is meaningful, the fi/h being consequences and the being 

numbers su(di tliat ^{(t^Pu) = 1 Such mixtures are exemplified by an 
msuiance policy m which the benefit is an annuity payable during the 
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life of the beneficiaiy, and by a lottery in which the prizes are tickets 
in xs^ther lotteiies 

In view of Theorem 1, it is natuial to say that f < g means that, for 
every act f in the class of acts coiiespondmg to f, f < g Coi responding 
definitions are to be understood for / < g, f < g, f < g, et.c*. 

Theokem 2 If f, g, and fi aie gambles, and 0 < p < 1 , tiion pf + 
(Ir— p)h < pg + (1 — p)/», if and only if f < g 

Pboof. Let f, g, fz, Ojj and Bz, Cj be acts, consequences, and paiti- 
tions such that f and g aie among the acts lepiesented by f and g, le- 
spectively, with /(s) — foi s e and g{s) = Qj foi s e Cj 

Constiuct Dzj cz Bz n Cj such that P(Dzj) = pP(Bz H Cj), and let 
D = U Then P{D) = p, P(B, | D) = P{B,), and P(C, | D) = 
P{C,) 

What IS to be proved is, in effect, that f < g given i), il and only if 
f < g In view of Theoiem 1 it is clear that whethei that is so oi not 
for f and g does not depend on the particulai choKui of /), so, with an 
obvious tempoiary extension of terminology, it is io bo pi oved that f < g 
given p, if and only if f < g. 

If f = g given a foi every 0 < a < 1, there is nothing tiO prove 
Otherwise it can be assumed without loss ol generality t/hat., foi some 
ao, f < g given ao 

In view of Theoiem 2 7 2, if a + < 1, f > g given n, and f > g 

given /5, then f > g given (a + /?), and similaily f > g given a/2 

Making use oi PG and Theorem 2 7.2, it can easily be shown that, for 
any a sufficiently close to ao, f < g given a 

The pieceding three paragraphs imply that, m tlie case at hand, 
f < g given a for every a, 0 < a < 1 ♦ 

Theorem 3 If f < g, and 0 < cr < p < 1, then pf + (I — p)g < 
af + (1 — <r)g. 

Proof. In view of the immediately veuiiable identities, 
pf + (1 — p)g = (p — or)f + [I — (p — <r)J X 

1 1 - (p - cr) i - (p - cr) 1 
erf + (1 — (T)g = (p — cr)g + [1 — (p — o')] X 

(1 - P) ] 

I - (p-cr) 


f "h 
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this theorem is a special case of Theorem 2, unless p = 1, and a- ==^0, 
in which case it is trivial ♦ 

Theorem 4 If < h and f i < g < h, then there is one and only 
one p such that pfi + (1 — p)f 2 - g 

Proof It follows immediately fiom Theorem 3 and the principle of 
the Dedekmd cut f that there is one and onlj^ one po such that 

crfi + (1 -- cr)f 2 < g, if O’ > po 

(5) 

+ (1 ~~ a-)f 2 > g, if cr < Po 

According to (5), no number, except possibly po, can satisfy the equiv- 
alence demanded by the theorem ^ 

Fhially, using (5) and P6 (much as it was used in the proof of Theo- 
lem 2), it follows that po does indeed satisfy the equivalence 

3 Utility, and preference among gambles 

The idea ot utility can most conveniently be intioduced in connec- 
tion with gambles or, equivalently, acts that with piobability one are 
conhncd to a finite number ot consequences, thus* A utility is a function 
U associaGng leal numbers with consequences in such a way that, if 
f - SpJ^ and g == then f < g, if and only if '2p^U(f^) < 'Z(TjU{gj) 

Wilting U\f] lor Npi()(/^), the condition takes the form U[f] < C7[g]. 
Rimilaily, it is convenient to undeistand that, for an act f, 

(1) U[f] = E(U(f)) 

In this notation the following obvious theorem gives a slightly different 
charactcnzation of utility. 

Theohjom 1 A real-valued function of consequences, U, is a utility; 
if and only it f < g is equivalent to U[f\ < U[gl provided f and g are 
both with probability one confined to a finite set of consequences 

Do the postulates thus far assumed guarantee that any utilities exist 
at alb' Can Theorem 1 be extended to an even wider class of acts? 
Does a gieat diversity of utilities exist, or does the relation < practi- 
cally determine the function U? These questions, here mentioned m 
the order m which they most naturally arise, are manifestly of great 
impoitance m undci standing utility For technical reasons, they will 

■j C’i , ii<M‘(‘ssiu’y, iiuy ititioductioii to the theoiv of the iej.1 numbers foi explana- 
tion ol this pninnphs (‘ g , Ohiiptei II of [G3] 
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answeied in a different older — the third followed by the first in this 
secSon, and the second in the next section 

If there is a utility at all, there is surely more than one, because a 
utility plus a constant and a utility times a positive constant are also 
obviously utilities; thus- 

Theorem 2 If U is a utility, and p, o - are leal numbers with p > 0, 
thdi U' = pU + (7 IS also a utility 

Corollary 1 If there exists a utility, and if / < then there ex- 
ists a utility TJ for which U(f) and U(g) are any preassigned pair of 
numbers, provided U(f) < U{g) 

Theorem 2 says that any increasing linear function of a utility is a 
utility The next theorem says that, conversely, any two utilities- are 
necessarily increasing lineai functions of one another 

Tiieoi^em 3 If U and U' are utilities, there exist numbers p and a 
such that U' = pU + O', p > 0 


Proof The first step of the pxoof will be to demonsiiatc the fol- 
lowing identity foi the two utilities U and U' and for any three conse- 
quences /, p, h 


( 2 ) 


1 1 1 
U{f) U{g) LJ(h) 
U'U) U'{q) U\h) 


- 0 


If any two of the consequences /, p, h are equivalent, two columns of 
the determinant in question are equal, and therefoie the determinant 
vanishes It can be assumed, then, that no two of /, p, and h arc equiv- 
alent; and there is no loss m geneiality, as may be seen by permuting 
columns, in assuming f < g < h Theorem 2 4 now pcimits the con- 
clusion that theie is a p such that p/+ (1 — p)h ^ g Therefoie, 

1 = pi + (I - p)l 

(3) t/(p) =pf7(/) +(l~p)t/(/0 

V'ig) = pU'if) + (1 - p)U'ih). 

Thus the middle row of the determinant is linearly dependent on the 
other two, so the determinant vanishes, as was asseited 
Now let p and h be any fixed pair of consequences such that g < /i, 
the existence of such a pair being assured by P5 Equation (2) can be 
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successively rewritten, where / is an arbitrary consequence, th^s* 
(4) l[U{g)U'{h) - U{h)V\g)] - U{f){U\h) - U'{g)] 

+ U\f)[U{h) - Um = 0 , 


(5) 


im - U\g) U{g)U'{h) - U{h)W{g) 

U(h) - U(g) U{h) ^ U{g) 


which pioves the theorem, foi U'(h) — U'(g) and U{h) — U(g) are 
both positive ♦ 


Corollary 2 If U and U' are utilities such that, foi some g < h, 
U(g) = U'(g) and U{h) = U'ih), then U and U' are the same, that is, 
for^veiy/, UU) = [/'(/) 

To summarize, if there is a utility at all, theie are an infinite number, 
but the array of utilities is not complicated, foi all can be generated 
from any one by inci easing linear transformations 

Turn now to the question of existence 

Theorem 4 There exists a utility 

Proof Von Neumann and Morgenstern prove essentially this theo- 
rem, as well as the preceding one, in the appendix of [V4] The following 
proof is theiis, expressed, as the teacher used to say, in my own words 
For this proof only, certain special nomenclatuie is introduced A 
set of gambles F is convex, if and only if, for every f, g s F and p, 0 < p 
<1, pf+(l — p)g£F An interval I of gambles is the set of all gam- 
bles f such that, for some fixed g and h (which determine the inteival), 
g ^ f ^ h A hyper-utility V on a convex set F is a real-valued func- 
tion of the gambles of F, such that f < g, if and only if V{f) < V(g), 
and such that F(pf + (1 — p)g) = pV{f) + (1 — p)F(g) 

The following remarks about this special nomenclature are obvious 
and will be repeatedly used in the proof, without explicit reference 
The set of all gambles is convex The intersection of two convex sets 
IS convex Every interval is convex There is an interval containing 
any finite set of gambles If theie is a hyper-utility on the set of all 
gambles, it is a utility when confined to consequences 

By the same method that led to the proofs of Theorems 2 and 3, 
if there is a hyper-utility on F containing g and h, with g < h, then there 
IS one and only one hyper-utility V on F such that 7(g) =0 and V (h) 
- 1 . 
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Jf / IS the inteival determined by g < h, then, according to Theorem 
2 4^theie is for every f in / a unique numbei, call it F(f), such that 

(6) f = (1 - V(f))g + V(f)h 

By repeal, ed use of Theorem 2 2, it follows foi any f,fcl that 

(7) pf + (1 - p)f' = p{(l - V(f))g + V(f)h] 

+ (1-P){(l-F(n)f/+ Vif'W 
= {1~ IpV(f) + (1 -* p)V(n]}g 
+ [pv(f) + (1 - p)V{nvh 

so V ifcjja hypei -utility on the convex set / 

From here on m this proof, let g, h be a fixed pan of conseciuences with 
g < h Making use of the preceding two paiagraplis, there is a unique 
hypei-utility as>s]gning the values 0 and 1 to g and A, respectively, on 
any one mien a, 1 coniaining g and h The intersection of two such in- 
tervals IS a convex set containing g and A, and on the intersection the 
hypei-utilities associated wilii tlie two inteivais aie both hyper-utihties 
attaching 0 and 1 to g and A, icspectively, they must, thcK'foie, be 
equal i,o one anothei on (,he mi.ei section 

Any gamble f is an element of some interval c.oniaining g and h 
J^et I'XO he the common value assigned to f by all th(‘ hypci-utilities 
that aie defined on inteivals coni-aming f, g, and li and that a.ssign the 
values 0 and J to g and A, lespectivcly. Since theie is always at least 
one such inteival for any gamble f, the function V is defined for all 
gambles 

The pi oof will be complete when it is shown that, V is a hyper-ut.ility 
foi the convex set of all gambles* Let fund P be any two gambles and 
p a number, 0 < p < 1 Theie is an interval containing f, P, g, //., and 
pf+ (] p)P In that interval the func.tion V is a hypor-ut.ihty 
theieloie l^(pf + (I - p)P) = pF(f) + (I - p)V{f) and V(f) < K(P), 
if and only if f < P. ♦ 

4 The extension of utility to more general acts 

The lequucment that an act have only a finite number of conse- 
quences may seem, fiom a piactical point of view, almost no require- 
ment at all To illustiate, the number of time intervals that might 
possibly be the duration of a human life can be regarded as finite, if 
you agiee that the duration may as well be rounded to the nearest 
minute, or second, or microsecond, and that there is almost no possi- 
bility ol its exceeding a thousand yeans Moie generally, it is plausible 
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that, no matter what set of consequences is envisaged, each con^- 
quence can be practically identified with some element of a suitably 
chosen finite, though possibly enormous, subset It might therefore 
seem of little or no importance to extend the concept of utility to acts 
having an infinite numbei of consequences If that aigument were 
valid, it could easily be extended to leach the conclusion that infinite 
sets aie irielevant to all practical affans, and theiefore to all parts of 
applied mathematics But it is one of the most piofound lessons of 
mathematical experience that infinite sets, tactfully handled, can lead 
to great simplification of situations that could, in pimciple, but only 
with enormous difficulty, be treated in terms of finite sets How diffi- 
cult it would be to study geometiy d one made at the outset the ^^sim- 
plifying assumption’^ that to all intents and puiposes at mostdO^’^®*^ 
points in space can be discriminated from one another! Again, it is 
geneially moie convenient and fruitful to think of the annual cash in- 
come of an individual oi firm as a continuous variable with an infinite 
number of possible values than as a discrete variable confined to some 
laigc fiiule lumibei of values, even if it is known that the income must 
be some mtegial number of cents less, say, than 10^^ 

One way to extend the concept of utility to acts with an infinite 
numbet of conse(iucnccs would be to postulate If !7[f] and !7[g] both 
exist (the \xilues +co and — co being legarded as possible), f < g, if 
and only if (/|f] < f/[g] I see no serious objection to making this as- 
sumption outiighi., though it might be complained that the assumption 
is mot-ivatxHl more by general mathematical intuition and experience 
than by mt/uitive standards of consistency among decisions, which I 
have ti ied to take as my sole guide thus far A statement almost as 
strong as the one m question can, howevei, be deiived on adjoining a 
new postulate, P7, more m the spirit of Pl-6 That rathei technical 
progiam will b(‘ earned out in the next several paragraphs Those not 
inteiested can sahdy skip to the paiagraph following Corollary 1 on 
page 80 

Suppose that evciy possible consequence of the act g is at Jeast as 
atti active to the peison as the act f consideied as a whole, then it seems 
to me witliin the spirit of the sure-thmg principle to conclude that 
f ^ gj the same might as fairly have been said foi the relations >, and 
also for the two lelations < given B and > given B This idea is for- 
malized m the following postulate, which, according to the conven- 
tions of mathematical double-talk, is to be interpreted as two proposi- 
tionS““ one having < and the other > throughout 

P7 If f < (>) g(s) given B for every s s J5, then f < (>) g given B 
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tAttention has been called to the mathematically useful fact that, if 
Pl~6 apply to a relation <, then they also apply to any i elation < 
given provided B is not null It is obvious that the same is true for 
Pi-T, a fact that will be used often It is also noteworthy that Pl~7 
obviously imply the propositions that arise if in them eveiy instance 
of the sign < is lepkced by > and every instance of > is replaced by 

< ^ Theiefore in any deduction from Pl~7 every instance of the signs 

< and > can be reversed to produce a deduction that may be called 
the symmetnc dual of the oiiginal deduction This remark, a legitimate 
child of the principle of insufficient reason, has not been important 
heretofoie, because almost all deductions thus far made have been their 
own symmetric duals Since that will not be so of some of the lemmas 
in the<piesent section, much needless writing and thinking can be saved 
by agreeing at the outset that, once a result is proved, it and its sym- 
metiic dual may be used as if both had been explicitly proved 

Befoic going to work with P7, some may wish to see an example of 
a mathematical stiucture satisfying Pl~~6 but not. satisfying P7 More- 
over, undei standing of such an example will do much to claiify the uses 
to be made of P7 To construct the example, begin by letting S be a 
set carrying a finitely additive probability measure P under whi(‘h S 
can be partitioned into subsets of arbitral ily small probability Let 
the set of consequences be the half-open inteival of numbers 0 < / < I. 
Let t/(/) - /, LJ\f] = E{f), and 

(1) F[f] = limP{/CsO>l~6l. 

(i ->► (j 

Since the piobability in (1) dcci eases with e, there is no question about 
the existence of the limit Now let W[f] = f/[f] + F[f], and define 
f < g to mean that W[f] < H'’[g] Checking postulates PkO, it will 
be found that the < thus defined satisfies them all, and tliat what has 
here been called U(f) is mdeixl a utility for < But if, for example, 
tlieie IS an f such that f/[fj — V[f\ = P7 is violated, as can be seen 
liy (K)mpaiing f i,o the act that-, lor each s, takes as value the maximum 
oi I and /(s) Whether there can be such an f, may, so far as 1 know, 
depend on the choice of S and P But, it the positive integers are taken 
as aS, and P is so chosen that though the probability of any one integer 
IS 0 the probability of the set of even mtegeis is 1/2, a possibility as- 
suied by the note to Section 3 of Chaptei II on p 231 of [B4], the func- 
tion equal to 0 at the odd integers and eciual to (1 — 1/n) at each even 
n IS such an f Finite, as opposed to countable, additivity seems to be 
essential to tins example, peihaps, if the theoiy were worked out m a 
countably additive spirit from the start, little or no counterpart of P7 
would be necessary. 
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Several lemmas depending on P7 are now to be proved preparat(j|;;y 
to proving that ?7[f] governs preference for a very large class of acts 
It IS to be understood throughout the section that U is any fixed utility 
The truth of each lemma is intuitively clear, m the sense that each could 
justifiably be accepted as a postulate if need be Since they are also 
easy to prove and of secondary interest, condensed proofs will suffice 

Lemma 1 If, for every consequence /i, f < /i, and g < h, then f g 

Proof Consider m the light of P7 that f < g(s) and g < /(s) foi 
every s ♦ 

Lemma 2 If there exists a consequence fo such that f < /o, and if 
(/(§)) < Uo for every s, then there exists a gamble g such that f < g 

and U[g] < Uo 

* 

Proof. If U(fo) < Uq, then g can be taken to consist of fo alone 
Otherwise, let fi be any consequence such that U(fi) < Uo and let g 
be the unique mixtuie of fo and/i such that U(g) = I/q ♦ 

Lemma 3 

Hyp 1 The Bfs, t = 1, * , ^, aie a partition, and the Uz’s are 

conespondmg numbeis 

2 f is an act such that U(f(s)) < U^ foi s 

3 f IS a gamble such that f < f 

CONCL U[f] < XU^P(B^) 

Proof If the lemma weie false, it would be false even for some f < f 
Then it may be assumed, modifying f if need be by means of P6 and 
Lemma 1, that there exists for each i an f^ such that f < /^ given Bi 
Now, m view of Lemma 2, there exists for each ^ a such that f < 
given B^ and C/[gJ < U^ Let g = 'LP{B^)g^, and obseive that f < 
f < g Therefore, I/[f] < U[g] = ^P{B^)U{g^) < l^P^B^W^ ♦ 

An act will be called bounded if its utility is, according to ordinal y 
mathematical usage, an essentially bounded random variable, the no- 
tion IS put in a more formal and self-contained way as follows A hounded 
act IS an act f such that, for some two numbers Uo and Ui, P{Uo < 
U(f{s)) < Ui} = 1 The definition is clearly not dependent on the 
choice of U 

Theorem 1 If f and g are bounded, then f < g, if and only if 
U[i] < U[g] 

Proof If there exist g and h such that g < h, then there is, 
by Theorem 2 4, a mixtuie f of g and h such that f = f The null event 
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cm which U{f(s)) is not between Uo and Ui as well be disiegaided, 
the rest can he partitioned into 7i + 1 events Bj, defined by the condition 
that s eBi if and only il < U(j(s)) < hh, i ~ 1, *, n + 1, 

wheie 

(2) ^ I (^ “ “) ' 1 ’ ' " ^ 

Applying Lemma 3 and its symmetiic dual, 

(3) < ^V,P{B,) 

Similaily, accoiding to Exeicise 3 of Appendix 1, 

(4) < U[f] < S7.P(J50 

Therefoie 

(5) 1 t/[f] - U[f] I < SCU - = {Ux - (■-/„)/«, 

whence U(f) - U(f) 

To consider the remaining case, suppose that- th(' boiintled a(‘d f ex- 
ceeds (is exceeded by) every consequence, call i(. for the moment Inq 
(little) Accoiding to Lemma 1, all big (and, dually, all little) acts arc 
equivalent to one another Fuithcnnoie, it is, ioi example, easily seen 
that, if an act is big, then lor e > 0, 

(6) P{P(m) > aup (/(f) - e} = I. 

(Some may be more familiar with the notation and “(}LB,’^ 

read ^deast upper bound” and “gieatest lowei bound,” than with the 
corresponding “sup” and “mf,” icad “supremnm” and “infimum ” If 
even these older terms arc not familiar, see Exercise 4 of Appendix 2 ) 
Therefoie, if there aie big (little) acts, (Jiey all have the same expected 
utility, namely sup U(f) (mi V(f)) 

Suppose now that f < g It is possible that f and g aie both little; 
that f is little, and g is equivalent to some gamble, that f is little and 
g big, that f and g aie each e(|uivalent to some gamble, that f is equiva- 
lent to some gamble, and g is big, oi, (inally, that they aie both big 
In each of these cases, a simple argument shows that (7[f] < t/[g] 
The converse arguments aie similai ♦ 

CoEOLLARY 1 If f and g are bounded, and P{B) > 0, then f < g 
given B, if and only if E{U(f) — U(g)\ B) < 0 

It would be possible to explore unbounded acts for which expected 
utility exists to see whether expected utility governs preferences among 
even such acts undei postulates PI-? or under some extension of them 
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I do not think, however, that the question is siifBciently inteiestmg to 
wail ant attention here, especially since there is some reason, first stated 
by Gabiiel Cramei in a letter paitially repioduced in [BIO], to postulate 
that there aie uppei and lower bounds to utility, in which case all acts 
would necessarily be bounded 

Even without P7, the postulates imply, m the following sense, that 
no gamble has infinite or minus infinite utility 

An act f has infimte {minus infinite) utility , if and only if, for some 
g < {>) h and for every € > 0, there is a R with P{B) < e and such 
that the act equal to f on 5 and to g on exceeds (is exceeded by) h 
A gamble or a consequence would be said to have infinite {minus in- 
fimte) utility j if one of the acts corresponding to it had infinite (minus 
infimte) utility 

Indeed, Theorem 2 4, a deduction from Pl-6, obviously implies that 
theie aie no infimte or minus infimte gambles or consequences It 
may, how^evei, be mentioned that Pascal held that, in just the sense 
at hand, salvation is an infinite consequence ([P2], pp 189-191) Again, 
it IS often said, in effect, that the utility to a peison of immediate death 
IS a consequence of minus infinite utility, but casual observation shows 
that tins IS not true of anyone — at least not of anyone w^ho w^ould cross 
the stiect to giect a fiiend In the same vein, medicine often gives lip 
service to the idea that the death of a patient is of minus infinite utility, 
and, of couiso, doctors do go to gieat lengths to keep then patients 
alive, l)ut a doctor who took the idea too seriously would make a nui- 
sance of himself and soon find himself with no patients to treasure 

If the utility of consequences is unbounded, say from above, f then, 
even in the presence of Pl-7, acts (though not gambles) of infinite 
utility can easily be constiucted My personal feeling is that, theo- 
logical quest.ions aside, there are no acts of infinite or minus infinite 
utility, and that one might reasonably so postulate, which would amount 
to assuming utility to be bounded 

Justifiable though it might be, that assumption would entail a cer- 
tain mathematical aw^kwaidness m many practical contexts For ex- 
ample, as will he discnissed at greater length m Chapter 15, it sometimes 
seems reasonable to suppose that the penalty for acting as though a 
particular unknown number w^ere A instead of its true value, /x, is propor- 
tional to 5^ = (ju — A)^ But, if the possible values of g are unbounded, 
then so are the possible values of d, so utility is here taken to be un- 
bounded On close scrutiny of such an example one always finds that 

1 That IS, if, for eveiy E, theie is a consequence / such that V < U{f) This 
mannei of speaking is pernnsbible, because m view of Theorem 3 3, if one utility is 
bounded, all are 
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IS not really reasonable to assume the penalty even loiighly piopoi- 
tional to 6^ foi large values of 5^, but lather that large values aie so im- 
probable that the eiror made m misappiaising the penalty associated 
with them is negligible compared to the saving in simplicity lesiilting 
fiom the misappiaisal II tlie assumption of bounded utility were made 
part ol the theory of personal piobabihty, then aJiy example in which 
unbounded utility is used loi mathematical simplicity would be in con- 
tiadiction to the postulates I propose, theielore, not to assume hounded 
utility formally, but to remember that pioblems involving unbounded 
utility aie to be handled cautiously 

To take stock of the chapter thus far, utility having been established, 
it is now superfluous to consider that consequences may be of all sorts, 
since Jbhe postulates imply that m virtually eveiy context a consequence 
is adequately chaiacteiized by its utility, some one utility function 
having been chosen from the linear family of possibilities The) etore, 
unless the contiaiy is cleaily indicated, /, g, and h will hcucelortJi mean 
not exactly conseciuences in the sense used tn dale, bui, raihei leal 
numbeis measuring utility in units to be called utiles Ooirespondmgly, 
an act f will hcnceloith lie understood to bo a leal- valued random varia- 
ble The entile theory ol piefcronce, at least for bounded acds, can 
now be summanzed by the following resume 

R f g given /i, if and only if P{B) ~ 0, oi B{i — g | H) < 0 

From now on, though not lormulated as a postiulate, it* is to be assumed 
without further (pubblmg that R holds, piovided only t-hat K[i) and 
i?(g) exist and aic finite; no attempt will be made to compare a(‘(*s foi 
which the expected value does not exist oi is infinite 

If a person is fice to decide among a set F of acts, he will presumably 
choose one the expectation of which is where 

(7) v{¥) - sup im, 

f p 

provided that such a one exists This piovision must be inont-ioned, 
even though a set F foi which v(F) = oo will, by convonf.ion, not be 
considered to give rise to a valid decision problem, for, if F is infinite in 
number, thoie may be no act in F with expect, ation quite as great as 
t;(F). Nonetheless, y(F) may, in a sense, bo regarded as the value or 
utility of the set of acts F, as is discussed in the penultimate paragraph 
of § 65 

5 Small worlds 

Allusion was made in the penultimate paiagiaph of § 2 5 to the prac- 
Lical necessity of confining attention to, or isolating, i datively simple 
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situations in almost all applications of the theory of decision developed 
m this book As was mentioned theie, I find it difficult to say with 
any completeness how such isolated situations are actually arrived at 
and justified The purpose of the present section is to take some steps 
towaid the solution of that problem or, at any rate, to set the problem 
forth as clearly as I can This section, though important for a critical 
evaluation of the thesis of this book, is not essential to a casual reading 

Making an extreme idealization, which has in principle guided the 
whole argument of this book thus far, a person has only one decision 
to make in his whole life He must, namely, decide how to live, and 
this he might in principle do once and for all Though many, like my- 
self, have found the concept of overall decision stimulating, it is cer- 
tainly highly unrealistic and in many contexts unwieldy f Anyt*claim 
to refi^lism made by this book — or indeed by almost any theory of per- 
sonal decision of which I know — is predicated on the idea that some of 
the individual decision situations into which actual people tend to sub- 
divide the single giand decision do lecapitulate in microcosm the mech- 
anism of the idealized grand decision One application of the theory 
of utility to overall decisions has, however, been attempted by Milton 
Friedman in [Fll] 

The pioblem of this section is to say as clearly as possible what con- 
stitutes a satisfactoiy isolated decision situation The geneial method 
of attack I propose to follow, foi w^ant of a better one, is to talk in terms 
of the gland situation — tongue in cheek — and in those terms to analyze 
and discuss isolated decision situations I hope you will be able to 
agree, as the discussion proceeds, that I do not lean too heavily on the 
concept of the giand decision situation 

Consider a simple example Jones is faced with the decision whethei 
to buy a ceitain sedan for a thousand dollars, a certain convertible also 
for a thousand dollars, or to buy neither and continue earless The 
simplest analysis, and the one generally assumed, is that Jones is de- 
ciding between three definite and sure enjoyments, that of the sedan, 
the convertible, or the thousand dollars Chance and uncertainty are 
considered to have nothing to do with the situation This simple anal- 
ysis may well be appropriate in some contexts, however, it is not diffi- 
cult to recognize that Jones must in fact take account of many uncer- 
tain future possibilities in actually making his choice The relative 

t Uniealistic though the concept is, it would be a mistake, arising out of elliptical 
presentation, to suppose that the concept predicates the choice of a complete life- 
long policy by new-born babies If a person ever reached such a level of maturity 
as to be able to make a lifelong choice for his life from that time on, he would then 
become a person to whom the concept could be literally applied 
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jSfe^gility of the convertible will be compensated only if Jones’s hope to 
arrange a long vacation in a warm and scenic pait of the country ac- 
tually mateiializes, Jones would not buy a car at all if he thought it 
likely that he would immediately be faced by a financial emergency 
arising out of the sickness of himself or of some membei of his family, 
he would be glad to put the money into a cai, oi almost any durable 
goods, if he feared extensive inflation This brings out the fact that 
what aie often thought of as consequences (that is, sure expeiiences of 
the deciding peison) m isolated decision situations typically are in re- 
ality highly unceitain Indeed, in the final analysis, a consequence is 
an idealization that can perhaps never be well approximated I there- 
fore suggest that we must expect acts with actually uncertain conse- 
quences to play the role of sure consequences in typical isolated decision 
situations ^ 

Suppose now, to elaborate the example, that Jones is presented with 
a choice between tickets in seveial difieient/ lotteries sucli that, wluch- 
ever he chooses and whatever tickets aio drawn, lie will win either 
nothing, the sedan, the convertible, oi a thousand dollais None of 
these four consequences — not even ^fiu)thing”-“us acjtuallv a sine con- 
sequence in the strict sense, as 1 think you will now undcM’stand I 
propose to analyze Jones’s present decision sit4iai,ion m i.erms of a 
^^small world” The more colhxiuial (iieek word, munocosm, will be 
leserved for a special kind of small w^orld t.o b(' des(‘nl)(Hl hiter To de- 
scribe the state ol the small woild is to sav which is associated 
with each of the tickets offered to ,lones Th(‘ small-world acts ac^t.ually 
available to Jones are acceptance of one oi anotiun’ ol l,he tickets 
The generic small-woild act is an arbitrary function (aking as its value 
one of the foui small-world (‘onsequemass acaairdmg 1,0 winch small- 
world state obtains 

It Will be noticed that the small-world stat.es are in fact cvenl.s in 
the grand world, that indeexl they constit.ute a partition of tlu^ grand 
world. If they aie an intimte numbei of small-world states, as indeed 
there must be, if the small world is t.o satisfy t.he postulates Pi -7, then 
the partition in question becomes an infinite partition j Jdiesc con- 
siderations lead to the following technical definitions 

Let the grand world S be, as always, a set with elements s, s', • • • 
The grand-world consequences S may as well be taken to be a bounded 

t Technical note* It is mathematically nioio genet al and elegant not to insist that 
the small world have states at all, but rather to speak of a special class of events as 
small-woild events This class should he closed under complements and hiute unions 
In short, the small-woild events, and theieby the small woild itself, constitute a 
Boolean subalgebra of the Boolean algebtii ol the grand-wotld <wents 
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set of real numbers The grand-world acts aie then real- valued funpr 
tions f, g, h, . The preference oidermg between acts is determined 
by the condition that f < g if and only if 

(1) -^(f - g) < 0, 

wheie the expected value indicated in (1) is deiived from a probability 
measuie P chaiacteiistic of the grand Avoild or, to be more exact, ^of 
the person’s attitude towaid the grand world 

The constiuction of a small world S from the grand world S begins 
with the partition of S into subsets, oi small-woild states s, s', * • (not 
necessarily finite in numbei) Thioughout this technical discussion, it 
will be necessary to bear in mind certain double mteipretations such 
as that s is both an element of S and a subset of S Stiictly speaking, a 
small-world event B in S is a collection of subsets of S and not itself a 
subset of S However, the union of all the elements of B, legaided as 
subsets of S, IS an event in S, call it [B] 

The small woild, as I mean to define it, is detei mined not only by 
the definition of a state, but also by the definition of small-world con- 
sequences A small-world consequence is a grand- world act A set F of 
grand-woild acts, legaided as small- world consequences, is thus pait of 
the definition of any given small woild It will be mathematically 
simplest, and cost little if anything in insight, to suppose that the ele- 
ments of F aie finite in numbei They will be denoted/, g, h, , 
and, when the small-world consequence / is lecogmzed as a grand-woild 
act, /(s) will denote the giand-woild consequence of / at the grand- 
world state s 

A small-woild act f is, of couise, a function from small-woild states s 
to small-woild consequences J In this isolated technical discussion, we 
will hobble along with the notations f(s) foi the small-woild conse- 
quence attached to s by f, and f(s, s) foi the grand-wmild consequence 
attached to s by J(s) lecognized as a giand-world act Each small- 
world act f gives use to a unique giand-woild act f, defined thus 

(2) /(s) =Df/(s, s(s)), 

where s(s) means that small-world state s of which the grand-world 
state s IS an element 

The distinction between f and f, like some other distinctions I have 
thought it woith while to make m the present complicated context, is 
perhaps pedantic At any rate, it is to be understood as part of the 
definition of a small world that f < g if and only if f < g, that is, in 
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■«ew of (1), if and only if E(f) < E(g) In this connection, it is useful 
ttfnote that 

(3) E0) = S ^(f |/(s(s)) = k)P{ms)) = *) 

L £ F 

= Zk!ik\Km) = k)P(ms)) = k) 

I 

•It may be advantageous to leview (3), and thereby the whole tcchni-’ 
cal definition of a small world, m terms of an example A smalhworld 
act, typified by the pin chase of a lottery ticket, amounts to accepting 
the consequences of one of several ordinal y grand-world acts according 
to which element of a partition does in fact obtain For example, the 
participant in a lottery may drive away a car, lead away a goat, face 
a filing squad, or remain in the status quo, according to the terms of 
the lottery and according to ivhich ticket he has in fact drawn Letting 
the example of the lottciy stand for the geneial situation, the expected 
utility of a loi-tery ticket can be computed by the paitiition loimula 
(3 5 3) fiom the conditional expectation associated with each l«icket, 
which IS what (3) docs 

It may faiily be said that a lotteiy piizc is not an act, but rathei the 
opportunity to choose from a numbei of acts Thus a cash prize puts 
its possessor in a position to choose among many pui chases he could 
not otherwise aflotd I believe that analysis to be moie luxiily correct, 
but it IS more (‘ompheated, and, if one thinks of each soi» of a^c.ts made 
available by a lo(/tery prize as repicsentcd by a best acd ol that set, 
the moic compluiated analysis seems superfluous, at least/ in a first 
attack. 

A small woild is completely satisfactoiy for the use to which I mean 
to put it, if and only if it itself satisfies the seven postula1/Cs and leads 
to — more technically, agrees with — a probability P such that 

( 4 ) hb) = pm) 

for all B aE and has a utility U such that 

(5) U(J) = E{J) 

for all / £ F For the present context, call such a completely satisfac- 
tory small world a microcosm, if the small world satisfies the postulates, 
but does not necessarily admit P as its probability nor [7 as a utility, 
call it a pseudo-microcosm 

To display the circumstances under which a small world is a pseudo- 
microcosm, I shall briefly comment on each of the postulates in the 
form given on the end papers of this book, referring to them here as 
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Pl-7, as opposed to Pl~7, to emphasize that they are heie being copr 
sidered with lespect to S and F 

PI Simple ordering 

Automatically satisfied Indeed it is directly implied by PI 

P2 Conditional preference well defined 

Automatic ^ 

P3 Conditional preference does not effect consequences 

Requiies exactly that, for every J, g sF, and B c S, either 

a I < g given [ 5 ], if and only if / < or 

b • h < k given [B], for every h,ktF 

In these inequalities the elements of F are of course intei preted as 
grand-world acts 

P4 Qualitative personal probability well defined 

Requiies exactly that, iij<g and where 

An(s) = ^ for s s [B] 

= / for s £ ^[B] 

( 0 ) 

hc(s) = g for s e [C] 

= J fors£^[C], 

then h^B ^ wheie and /I'c are defined in terms of/', g\ f < g', 
in analogy with (6) 

This postulate is automatic in case F has at most two elements 

P5 The person has some definite pieference 

Requires / < ^ for some f, g bF 

P6 Partition of worlds into tiny events 

It IS clear that this postulate is not automatic, that is, it is not im- 
plied by the validity of Pl-7_for the giand world It is not even im- 
plied by Pl-7 together with Pl-5, though in the presence of all these 
P6 could undoubtedly be weakened There seems to be little to gam 
in the piesent context by i educing P6 to such minimal terms, nor by 
expiessmg it, as Pl-5 have been expiessed, in grand-world terms alone, 
for P6 does not lend itself easily to such treatment, though it would be 
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e€i^sy to decide m any instance whether P6 obtained without undue 
refeience to the grand world 


P7 Stwng joim oj sure-tJnng p}%nc7ple 


Automatic, m Anew oi the explicit assumption that. F has only a 
finite niimbei of elements 

JTo summaiize, a small woild is a pseudo-microcosm, if and only if 
it satisfies P3~6 The possibility of enlarging an arbitral y small woild 
m such a way as to satisfy those conditions has alieady been implicitly 
discussed in connection with P3-6 To recall the arguments that were 
adduced, one might reAuew the example about the egg in § 3 1, and 
the fuither discussion of that example m the opening paragraph of 
§ 3 2, The remark in § 3 2, intioducing P5, and the example about the 
com following PG' in § 3 3 

It is encouraging to possess the arguments just cded lending to sliow 
that any small woild can without overwhelming difficulty be embedded 
in a some^vhat lai gei small woi Id that is a pseudo-microcosm A psoudo- 
miciocosm is, howevei, compleloly satisfactory, only if xl is actually a 
miciocosm, that is, only it it leads to a jiiobabihty measure and a 
utility well aiticulated with those of the grand woild The pioblern of 
deciding undei what circumstances t.hat occairs is much lacilitated by 
the fact that the probabihly measuie and a utility of a, pseudo-micro- 
cosm can be wiitten down explicitly, as thi^ next, lew paragraphs show 

To study the probkan, suppose the sma.ll world is a pseiKlo-mi(*ro- 
cosm Then, in view of let b(‘ elemiaits ol F such that, g < li, 
and let 


( 7 ) 


QiF) 


E(h - q I [B]) 

E(h - g) 


Piim 


\h{,) ^ g{,)\ cll\s) 

Jim 

By using P3 to check the positivity, it. is easily veiified that. Q is a prob- 
ability measure on Pi The piobabihty measuie Q agiecs with the le- 
lation < between small-woild events, which is easily venfietl on re- 
wnting_(3) foi the special small-woild a(*t fi^ that takes t.he value li 
for 6 e S and g tor ,s B thus 

(8) E[h) = Ii{h I [E])Pm) + I -[/1])P(~[/1]) 

= E{h - c] 1 [B])Pi[B]) + E{g) 

= E{h - g)(m + Eig) 
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Since g and h are essentially arbitrary, there aie many wa^^s to conr 
stiuct a probability measuie that agrees with the relation < between 
small-world events, but, in the presence of Pl-6, all of them must (in 
view of Coiollary 3 3 1) be the same as Q That consideration leads to 
the formula 

(9) E{J - r 1 [5])P([£]) = E{J - J')Q{B) 
forall/J'eFandScS 

Using (9) and lecallmg that [/(/) has been defmed as ^(/), (3) can 
be rewritten thus 

(10) ECi) = E{g) + T.E{k-g\ Ms)) = Il)PiMs)) = k) 

L 

= E u{k)Q{Ks) = k) 

• ^ 

The question whethei a given pseudo-miciocosm is really a micio- 
cosm IS the question whether Q{B) = F([S]) and whether U is a utility 
for the pseudo-mici ocosm The answei to the second pait is immediate 
and, I think, somewhat surprising, for (10) shows that for any pseudo- 
microcosm U is indeed a utility 

Unfortunately, the condition Q(B) = P([B]) is not also automatic 
The possibility of its failing to be satisfied is illustiated by the tollovmg 
simple mathematical example Let >S be the unit squaie 0 < ^, ^ < 1, 
and let 

(11) E{f) = f f f(i, y) dx dij 

Jq Jq 

It IS of no leal moment that the integral in (11), if undei stood in the 
Lebesgue or Riemann sense, is not defined for all bounded functions 
Let the elements of S be the vertical line segments, x = constant 
Finally, suppose that the elements of F consist of the function zeio and 
any finite numbei of non-negative multiples of a fixed positive function 
h — h It IS easy to verify that E as thus defined is a pseudo-mici ocosm 
and that 



( 13 ) 
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{Jnless q IS 1 for every x', which will not at all typically be the case, B 
IS not really a microcosm 

The geneial condition that a pseudo-microcosm be a miciocosm — i e , 
that Q(B) = P([B ]) — IS evidently, in view of (9), 

( 14 ) = 

foi eveiy /, J' e F and every B for which P([S]) > 0 Incidentally, 
that condition alone practically implies that a small woild S, not neces- 
saiily assumed to be a pseudo-miciocosm, is a real microcosm More 
exactly, it implies all the postulates Pl-7, except P6; and it implies 
that the probability measuie P agrees with the i elation < between 
small-world events Also, if a small woild is a pseudo-microcosm, it is 
enough that (14) should hold foi some pan of functions foi which the 
right-hand side of the equation does not vanish ^ 

Equation (14) is, however, unsatisfactoiy m that it seems incapable 
of veiification without taking the giand woild much too seriously 
Some consolaiion may derive fiom the fact that it/ and /' are constants 
they automatically satisfy (14) Two such absolute, oi giand-woiid, 
consequences would suffice, for, as has just been lemarked, it is suffi- 
cient that (14) be satisfied for tux) materially difleient. small-world 
consequences, m the piesence of Pl~7 (which aic venfiable without 
any detailed knowledge of the grand woild) It must,, however, be ad- 
mitted, as has aheady been mentioned, that the very idea of a giand- 
woild consequence takes the giand world pietty senously- a point 
foiced into my leluctant mind by a conversation with Ph’aiuicsco Bram- 
billa 

I leel, if I may be allowed to say so, that the possibility ol being taken 
m by a pseudo-mici ocosm that is not a leal miciocosm is i emote, but 
the difficulty I find m defining an operationally applicable criterion is, 
to say the least, ground for caution 

Theie ceitamly seem to be cases in which one could confidently as- 
sume (14), though thus fai formal analysis ol the soince of such se- 
cunty escapes me Consider, for example, a lottery m winch numbcicd 
tickets are drawn from a drum It seems clear that for an ordinary 
person the outcome of the lottery is utterly irrelevant to his life, except 
through the rules of the lottery itself In other terms equally loose, 
the value of a thousand dollars, or of a car, to a person would not ordi- 
narily depend at all on what numbers were drawn in a lottery, unless 
the person himself (or perhaps some other person oi organization with 
whom he had some degree of contact) held tickets m the lottery A 
moie precise formulation, which docs indeed imply (14), is that the 
events that lepiesent the outcome of the lottery are all statistically 
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independent of the grand-world acts, or functions, that typically entel’ 
as prizes m a lottery This suggests once more that it would be desir- 
able, if possible, to find a simple qualitative personal description of in- 
dependence between events (Compare the first paragraph after 
(3 5 2)) 

6 Historical and critical comments on utiUty 

A casual historical sketch of the concept of utility will perhaps have 
some interest as history At any rate, most of the critical ideas per- 
taining to utility that I wish to discuss find their places in such a sketch 
as conveniently as in any other organization I can devise Much more 
detailed material on the history of utility, especially in so far as the 
economics of risk bearing is concerned, is to be found in Arrow’s review 
article [A6] Stigler’s historical study [S18] emphasizes the history of 
the now almost obsolete economic notion of utility in riskless situations, 
a notion still sometimes confused with the one under discussion 

As was mentioned in § 4 5, the earliest mathematical studies of prob- 
ability were largely concerned with gambling, particularly with the 
question of which of several available cash gambles is most advanta- 
geous Early probabilists advanced the maxim that the gamble with 
the highest expected winnings is best or, in terms of utility, that wealth 
measured in cash is a utility function Some sense can be seen in that 
maxim, which will here be called by its traditional though misleading 
name, the principle of mathematical expectation. First, it has often been 
argued that the principle follows for the long run from the weak law of 
large numbers, applied to large numbers of independent bets, in each 
of which only sums that the gambler considers small are to be won or 
lost. Second, Daniel Bernoulli, who, in [BIO], was one of the first to 
introduce a general idea of utility corresponding to that developed m 
the preceding three sections, made the following analysis of the princi- 
ple, which justifies its application in limited but important contexts 
If the consequences / to be considered are all quantities of cash, it is 
reasonable to suppose that U{f) will change smoothly with changes in 
f Therefore, if a person’s present wealth is /o, and he contemplates 
various gambles, none of which can greatly change his wealth, the 
utility function can, for his particular purpose, be approximated by its 
tangent at /o, that is, 

(1) U(f)c^U(fo) + (f-foW'{fol 

a linear function of /. Since a constant term is irrelevant to any com- 
parison of expected values, the approximation amounts to regarding ^ 
utility as proportional to wealth, that is, to following the principle of 
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Iiiathematical expectation So far as I know, the only other argument 
for the principle that has ever been advanced is one concerning equity 
between two playeis As Bernoulli says, that argument is irrelevant at 
best, and neither of the relevant arguments justifies categorial accept- 
ance of the principle None the less, the principle was at first so cate- 
gorically accepted that it seemed paiadoxical to mathematicians of the 
e^piy eighteenth century that presumably prudent individuals i eject 
the pimciple m ceitain real and hypothetical decision situations 
Daniel Beinoulli (1700-1782), in the paper [BIO], seems to have 
been the fiist to point out that the principle is at best a rule of thumb, 
and he there suggested the maximization of expected utility as a more 
valid principle Daniel Bernoulli’s paper reproduces portions of a let- 
ter from Gabriel Cramer to Nicholas Bernoulli, which establishes 
Cramer’s chronological priority to the idea of utility and most (/ the 
othei mam ideas of Bernoulli’s paper But it is Beinoulli’s formulation 
togethei with some of the ideas that were specifically his that became 
popular and have had widespread influence to the present day It is 
there! ore appropnate to review Bernoulli’s paper in some detail 
Being unable to road Latin, I follow the German edition [Bll] 
Bernoulli begins by reminding his readers that the principle of mathe- 
matical expectaiion, though but weakly suppoited, had theretofore 
dominated the theory of liehavior in the face of uncertainty, lie says 
that, though many argumenls had been given for the principle, they 
were all based on the ii relevant idea of equity among players It seems 
hard to believe that he had never heard the argument justifying the 
pimciple for the long run, even though the weak law of large niimbeis 
was then only in its mathematical infancy Ars Conjectandi [B12], then 
a 1 airly up-t-o-date and most eminent treatise on probability, does seem 
to give only the argument about equity, and that in countless forms 
This treatise by Daniel’s uncle, Jacob (== James) Bernoulli (1G54-1705), 
incidentally, contains the first mathematical advance toward th(‘ weak 
law, piovmg li. for the special case of repeated trials 
Many examples show that the principle of mathematical expecta- 
tion is not uni vei sally applicable Daniel Beinoulli promptly presents 
one “To justify these icmarks, let us suppose a pauper happens to ac- 
quire a lotteiy ticket by which he may with equal piobabihty win 
either nothing or 20,000 ducats Will he have to evaluate the worth 
of the ticket as 10,000 ducats, and would he be acting foolishly, if he 
sold it for 9,000 ducats'i’ ” 

Other examples occur later m the paper as illustrations of the use 
of the utility concept. Thus a prudent mei chant may insure his ship 
against loss at sea, though he understands perfectly well that he is 
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thereby increasing the insurance company’s expected wealth, and tc^ 
the same extent decreasing his own Such behavior is in flagiant vio- 
lation of the principle of mathematical expectation, and to one who held 
that principle categoiically it would be as absurd to insure as to throw 
money away outiight But the principle is neither obvious nor de- 
duced from othei pimciples regarded as obvious, so it may be challenged, 
and must be, because everyone agiees that it is not leally insane to 
insuie 

Bernoulli cites a third, now very famous, example illustrating that 
men of prudence do not invariably obey the principle of mathematical 
expectation. This example, known as the St Petersburg paiadox (be- 
cause of the journal in which Bernoulli’s paper was published) had ear- 
lier been publicized by Nicholas Bernoulli, f and Daniel acknowledges 
it as^the stimulus that led to his investigation of utility Suppose, to 
state the St Petersburg paradox succinctly, that a person could choose 
between an act leaving his wealth fixed at its present magnitude or one 
that would change his wealth at landom, inci easing it by (2'^ - /) dol- 
lais with piobability 2^^ for eveiy positive integer n No matter how 
large the admission tee / may be, the expected income of the random 
act IS infinite, as may easily be veiified Theiefoie, according to the 
principle ot mathematical expectation, the landom act is to be pre- 
tened to the status quo Numeiical examples, however, soon convince 
any sincere person that he would piefer the status quo if / is at all 
large If / is $128, for example, theie is only 1 chance in 64 that a 
pel son choosing the random act will so much as bieak even, and he 
will otherwise lose at least $64, a jeopardy foi which he can seek com- 
pensation only in the piodigiously impiobable winning of a prodigiously 
high prize 

Appealing to intuition, Bernoulli says that the cash value of a per- 
son’s wealth IS not its true, or moial, worth to him Thus, according to 
Bernoulli, the dollar that might be precious to a pauper would be nearly 
woithless to a milhonaiie — or, better, to the pauper himself were he to 
become a millionaiie Bernoulli then postulates that people do seek 
to maximize the expected value of moial woith, or what has been called 
moral expectation 

Operationally, the moral worth of a pei son’s wealth, so far as it con- 
cerns behavior in the face of unceitamty, is just what I would call the 
utility of the wealth, and moral expectation is expectation of utility 

t Daniel refers to this Nicholas Bernoulli as his uncle, but, m view of dates men- 
tioned in the last section of Daniel’s paper and the genealogy in Chapter 8 of [B9], 
I think he must have meant his elder cousin (1687-1759), perhaps using “uncle” as 
a teim of defeience 
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^It seems mystical, however, to talk about moral worth apart fiom 
probability and, having done so, doubly mystical to postulate that this 
undefined quantity serves as a utility These obvious criticisms have 
naturally led many to disci edit the very idea of utility, but §§ 2-4 
show (following von Neumann and Morgenstem) that there is a moie 
cogent, though not altogether unobjectionable, path to that concept 

Bernoulli argued, elaborating the example of the pauper and the 
millionaire, that a fixed increment of cash wealth typically results m 
an ever smaller inclement of moral wealth as the basic cash wealth to 
which the increment applies is increased He admitted the possibility 
of examples in which this law of diminishing marginal utility, as it has 
come to be called m the literature of economics, might fail For ex- 
ample, a relatively small sum might be precious to a wealthy piisoner 
who required it to complete his ransom But Bernoulli insisteci that 
such examples aie unusual and that as a general rule the law may be 
assumed In mathematical terms, the law says that utility as a func- 
tion of money ls a concave (i c , the negative of a convex) function ] 
It follows fiom the basic inequality concerning convex functions (Theo- 
rem 1 of Appendix 2) that a peison to whom the law of diminishing 
marginal utility applies will always piefcr the status quo to any fair 
gamble, that is, to any random act for which the change m his expected 
wealth IS zero, and that he will always be willing to pay something in 
addition to its actuarial, oi expected, value for insuiance against any 
loss to himsell The law of diminishing maigmal utility has been very 
popular, and few who have consideied utility since Bernoulli have dis- 
carded it, or even realized that it was not necessarily pait and parcel 
of the utility idea Of couise, the law has been embiaced eagcily and 
uncritically by those who have a moral aveision to gambling 

Bernoulli went further than the law of diminishing marginal utility 
and suggested that the slope of utility as a function of wealth might, 
at least as a rule of thumb, be supposed, not only to decicasc with, but 
to be inversely propoitional to, the cash value of wealth 41ns, he 
pointed out, is equivalent to postulating that utility is equal to the 
logarithm (to any base) of the cash value of wealth. To this day, no 
othei function has been suggested as a better piototype for Eveiyman^s 
utility function None the less, as Cramer pointed out in his aforemen- 
tioned letter, the logarithm has a serious disadvantage, for, if the loga- 
rithm were the utility of wealth, the St Petersburg paradox could be 

t Often the meanings of ‘'convex” and “concave” as applied to functions aie in- 
terchanged A function is here called convex if it appears convex, in the oidmary 
sense of the word, when viewed from below Such a function is, of com sc, also con- 
cave from above, whence the confusion Cf Appendix 2 
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amended to produce a random act with an infinite expected utility 
(i e , an infinite expected logarithm of mcome) that, again, no one would 
really prefer to the status quo To take a less elaborate example, sup- 
pose that a man^s total wealth, mcludmg an appraisal of his future 
earning power, were a million dollars If the logaiithm of wealth were 
actually his utility, he would as soon as not flip a coin to decide whether 
his wealth should be changed to ten thousand dollars — roughly $500 
per year — or a hundred million dollars This seems preposterous to 
me At any rate, I am sure you can construct an example along the 
same lines that will seem preposterous to you Cramer therefore con- 
cluded, and I think rightly, that the utility of cash must be bounded, 
at least from above It seems to me that a good argument can also be 
adduced for supposing utility to be bounded from below, for, however 
wealth may be interpreted, we all subject our total wealth to slight 
jeopardy daily for the sake of a large probability of avoiding more 
moderate losses But the logarithm is unbounded both from above 
and fiom below, so, though it might be a reasonable approximation to 
a person's utility in a moderate range of wealth, it cannot be taken 
seriously over extreme ranges 

Bernoulli's ideas were accepted wholeheartedly by Laplace [LI], who 
was veiy enthusiastic about the applications of probability to all sorts 
of decision problems It is my casual impression, however, that from 
the time of Laplace until quite recently the idea of utility did not 
strongly influence either mathematical or practical probabilists 

For a long peiiod economists accepted Bernoulli's idea of moral 
wealth as the measuiement of a person's well-being apart from any 
consideiation of probability Though “utility" rather than “moral 
worth" has been the popular name for this concept among English- 
speaking economists, it is my impression that Bernoulli's paper is the 
principal, if not the sole, source of the notion for all economists, though 
the paper itself may often have been lost sight of Economists were for 
a time enthusiastic about the principle of diminishing marginal utility, 
and they saw what they believed to be reflections of it in many aspects 
of everyday life Why else, to paraphrase Alfred Marshall (pp 19, 
95 of [M2]), does a poor man walk in a ram that induces a rich man to 
take a cab*? 

During the period when the probability-less idea of utility was popu- 
lar with economists, they referred not only to the utility of money, 
but also to the utility of other consequences such as commodities (and 
services) and combinations (oi, better, patterns of consumption) of com- 
modities The theory of choice among consequences was expressed by 
the idea that, among the available consequences, a person prefers those 
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fiiat have the highest utility for him Also, the idea of diminishing 
marginal utility was extended from money to other commodities 

The probabilitydess idea of utility in economics has been completely 
discredited in the eyes of almost all economists, the following argument 
against it — originally advanced by Paieto in pp 158-159 and the 
Mathematical Appendix of [PI] — being widely accepted If utility is 
legarded as controlling only consequences, rathci than acts, it is not 
true — as it is when acts, oi at least gambles, are considered and the 
formal definition in § 3, is applied — that utility is detei mined except 
for a linear transformation Indeed, confining attention to conse- 
quences, any stiictly monotonically increasing function of one utility 
IS another utility Undei these circumstances theie is little, if any, 
value'^in talking about utility at all, unless, of course, special economic 
considerations should rendei one utility, or say a linear family of ^itili- 
ties, of particular mteiesi That possibility remains academic to date, 
though one attempt to exploit it was made by Irving Fisher, as is briefly 
discuss(xl III the paragraph leading to Footnoln 155 of [Si 8] fn par- 
ticular, utility as a tunction ot wealth can have any shape what.soever 
in the probability-! ess contexi,, provided only tliat the hinidion m ques- 
tion is increasing with inci easing wealth, the provision following from 
the casual observation that almost nobody throws money away. The 
histoiy ot piobability-less utility has been thoroughly reported by Stig- 
ler [SIS] 

What, then, becomes of the intuitive arguments tliat led to the no- 
tion ot diminishing maigmal utility To illustiate, consider the poor 
man and the rich man in the rain Those of us wlio consider diminish- 
ing marginal utility nonsensical in this context think it sufficuent to 
say simply that it is a common observation that iich men spend money 
freely to avoid moderate physical sulteiing wheieas poor men suffer 
freely rather than make conespondmg expenditiues of money, in other 
terms, that the rate of exchange between circumsta,ii(‘es piodiunrig phys- 
ical discomfoit and money depends on the wealth of the person involved 

In recent years thcie has been revived mtferest in Bernoulli's ideas 
of utility m the tcclmical sense of §§ 2-4, that is, as a lunct.ion that, so 
to speak, controls decisions among acts, or at least gambles Hamsey^s 
essays in [Rl], which in spiiit closely resemble the fust five chapters of 
this book, piesent a lelatively eaily example of this revival of interest 
Ramsey improves on Bernoulli in that he defines utility operationally 
in terms of the behavior of a peison constrained by certain postulates. 
Ramsey's essays, though now much apprecnat/Od, seem to have had 
relatively little influence 

Between the time of Ihxmsey and that of von N(‘ummin and Moigen- 
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stern there was interest m breaking away from the idea of maximizing 
expected utility, at least so far as economic theory was concerned (cf, 
[Tla]) This trend was supported by those who said that Bernoulli gives 
no reason for supposing that pieferences correspond to the expected 
value of some function, and that therefore much more general possi- 
bilities must be considered Why should not the range, the variance, 
and the skewness, not to mention countless other features, of the dis- 
tiibution of some function join with the expected value in determining 
prefeience‘^ The question was answered by the construction of Ramsey 
and again by that of von Neumann and Morgenstern, which has been 
slightly extended in §§ 2-4, it is simply a mathematical fact that, al- 
most any theory of probability having been adopted and the sure-thing 
principle having been suitably extended, the existence of a fimction 
wh^^e expected value controls choices can be deduced That does not 
mean that as a theory of actual economic behavior the theory of utility 
is absolutely established and cannot be overthrown Quite the con- 
tiary, it is a theoiy that makes factual predictions many of which can 
easily be observed to be false, but the theory may have some value in 
making economic predictions in ceitain contexts where the departures 
from it happen not to be devastating Moreover, as I have been argu- 
ing, it may have value as a noimative theory 

Von Neumann and Morgenstern initiated among economists and, to 
a lesser extent, also among statisticians an intense revival of inteiest 
m the technical utility concept by their treatment of utility, which ap- 
pears as a digression in [V4] 

The von Neumann-Morgenstern theory of utility has produced this 
reaction, because it gives strong intuitive giounds for accepting the 
Bernoullian utility hypothesis as a consequence of well-accepted maxims 
of behavior To give leaders of this book some idea of the von Neu- 
mann-Morgenstern theoiy, I may repeat that the treatment of utility 
as applied to gambles presented m § 3 is virtually copied from their 
book [V4] Indeed, their ideas on this subject are responsible for almost 
all of my own One idea now held by me that I think von Neumann 
and Morgenstern do not explicitly suppoit, and that so far as I know 
they might not wish to have attributed to them, is the normative In- 
tel pretation of the theoiy 

Of course, much of the new interest in utility takes the form of criti- 
cism and controversy. The greater part of this discussion that has come 
to my attention has not yet been published A list of references lead- 
ing to most of that which has is [B7], [W14], [SI], [C4], [F13], [A2] 

I shall successively discuss each of the recent major criticisms of the 
modern theoiy of utility known to me My method in each case will 
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bft first to state the criticism in a form resembling those in which it is 
typically put forward, regardless of whether I consider that form well 
chosen I will then discuss the criticism, elaborating its meaning and 
indicating its rebuttal, when there seems to me to bo one 

(a) Modern economic theorists have rigorously shown that there is 
no meaningful measuie of utility. More specifically, if any function TJ 
fulfills the role of a utility, then so does any strictly monotonically in- 
creasing function of TJ It must, theiefore, be an erioi to conclude that 
every utility is a linear function of every other 

This argument has been advanced with a seriousness that is surpris- 
ing, considering that it concedes little intelligence or learning to the 
proponents of the utility theory under discussion and consideiing that 
it results, as will inunediately be explained, from the baldest sort gif a 
terminological confusion To be fan, I must go on to say that I have 
never known the argument to be defended long m the presence of the 
explanation I am about to give 

In ordinary economic usage, especially piior to the work of von Neu- 
mann and Morgenstern, a utility associated with gambles would pie- 
sumably be simply a function U associating numbeis with gambles in 
such a way that f < g, if and only if U(f) < U(g), Ihough economic 
discussion of utility was, prior to von Neumann and Morgenstern, al- 
most exclusively confined to consequences lathei than to gambles oi 
to acts It IS unequivocally tiue, as I have already bi ought out, that 
any monotonic function of a utility in this wide classical sense is itself 
a utility What von Neumann and Morgenstern have shown, and 
what has been recapitulated in § 3, is that, giantmg certain hypotheses, 
there exists at least one classical utility V satisfying the very special 
condition 


( 2 ) V{af+Pg) =^aV{f)+fiV{g), 

where f and g are any gambles and a, arc iion-negativc numbeis such 
that a -b ^ = 1 Fuithermore, if I may foi the moment call a classical 
utility satisfying (2) a von Neumann-Morgenstern utility, cveiy von 
Neumann-Morgenstern utility is an inci easing linear function of every 
other To put the point differently, the essential conclusion of the von 
Neumann-Morgenstern utility theory is that (2) can be satisfied by a 
classical utility, but not by very many The confusion arises only be- 
cause von Neumann and Morgenstern use the already pre-empted word 
“utility” for what I here call “von Neumann-Moigenstern utility.” 
In retrospect, that seems to have been a mistake in tactics, but one of 
no long-range importance 
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(b) The postulates leading to the von Neumann-Morgenstern coj?.- 
cept of utility aie arbitrary and gratuitous 

Such a view can, of course, always be held without the slightest fear 
of rigorous refutation, but a critic holding it might perhaps be persuaded 
away from it by a reformulation of the postulates that he might find 
moie appealing than the original set, or by illuminating examples In 
particular, Pl-7 are quite diffeient from, but imply, the postulate« of 
von Neumann and Morgenstern Incidentally, the main function of 
the von Neumann-Morgenstern postulates themselves is to put the es- 
sential content of Daniel Bernoulli’s “postulate” into a form that is 
less gratuitous in appearance At least one serious critic, who had at 
first found the system of von Neumann and Morgenstern gratuitous, 
changed his mind when the possibility of derivmg certain asjjfects of 
th^ system fiom the sure-thing principle was pointed out to him 

(c) The suie-thmg principle goes too far. For example, if two lot- 
teries with cash piizes (not necessarily positive) are based on the same 
set of lotteiy tickets and so arranged that the prize that will be assigned 
to any ticket by the second lottery is at least as gieat as the prize as- 
signed to that ticket by the first lottery, then there is no doubt that 
virtually any person would find a ticket in the first lottery not prefer- 
able to the same ticket in the second lottery If, however, the prizes 
in each lottery aie themselves lottery tickets, such that the prize asso- 
ciated wuth any ticket in the first lottery is not piefeiied by the person 
under study to the prize associated with the same ticket by the second 
lottery, the conclusion that the person will not prefer a ticket in the 
first lottery to the same ticket m the second is no longer compelling 

This point lesembles the preceding one in that the intuitive appeal 
of an assumption can at most be indicated, not proved I do think it 
cogent, however, to stress in connection with this particulai point that 
a cash prize is to a large extent a lotteiy ticket in that the uncertainty 
as to what will become of a person if he has a gift of a thousand dollars 
is not in principle different from the uncertainty about what will be- 
come of him if he holds a lottery ticket of considerable actuarial value 

Perhaps an adherent to the criticism in question would think it rele- 
vant to reply thus Though cash sums are indeed essentially lottery 
tickets, a sum of money is worth at least as much to a person as a smaller 
sum, m a peculiarly definite and objective sense, because money can, 
if one desires, always be quickly and quietly thrown away, thereby 
making any sum available to a person who already has a larger sum 
But I have never heard that reply made, nor do I here plead its cogency. 
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(d) An actual systematic deviation from the siire-thmg pi inciple and, 
with it, from the von Neumann-Morgenstein theory of utility, can be 
exhibited Foi example, a peison might peifectly leasonably piefei to 
subsist on a packet of Army K rations pei meal than on two ounces of 
the best caviai per meal It is then to be expected, accoidmg to the 
suie-thing piinciple, that the person would piefei the K lations to a 
lotteiy ticket yielding the K lations with probability 9/10 and the 
caviar diet with piobability 1/10 That expectation is no doubt ful- 
filled, if the lotteiy is understood to deteimme the pei son’s year-long 
diet once and for all But, if the peison is able to have at each meal a 
lottery ticket offering him the K rations or the caviar with the indicated 
probabilities, it is not at all unlikely, granting that he likes caviar and 
has some storage facilities, that he will prefer this “lotteiy diet ” This 
conclusion is in defiance of the piinciple that “the theory of consumei 
demand is a static theoiy ” (Cf [W14] ) 

I admit that the theoiy of utility is not stai.ic in iho indicated sense, 
as the foiegoing example conclusively shows Bui- theio is not the 
slightest reason to think of a lotteiy pioducing either ii si-eady diet of 
caviar or a steady diet of K rations as being the same loidery as one 
having a multitude of diffeient prizes almost all ol whudi aie mixed 
chronological programs of caviar and K lations The lact iha,t. a theoiy 
of consumer behavior in riskless situations liappens to be stnt-K* m the 
lequired sense (under certain special assumptions about stornbihty and 
the linearity of puces) is no aigumeni; at all that the theoiy ol consumer 
behavior in risky circumstances should be static in the same sense (as 
I mention m a note appended to [W14]). 

(e) If the von Neiimann-Morgenstein theory of utility is not. static, 
it is not subject to repeated empirical observation and is theiefore 
vacuous (Cf [W14].) 

I think the discussion in § 3 1 of how to determine tlui pretcien(‘,es of 
a hot man foi a swim, a showei, and a glass of beer, and the discussion 
in § 5 of the practicality of identifying pscudo-mici ocosras are steps 
towaid showing how the theory can be put to empirical test without 
making repeated trials on any one person. 

(f) Casual observation shows that real people frequently and fla- 
grantly behave in disaccord with the utility theory, and that in fact be- 
havior of that sort is not at all typically consideied abnormal or ii- 
rational. 

Two diffeient topics call for discussion under this heading In the 
first place, it is undoubtedly tiue that the behavior of people does often 
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flagrantly depart from the theory None the less, all the world knows 
from the lessons of modern physics that a theory is not to be altogether 
1 ejected because it is not absolutely true It seems not unreasonable to 
suppose, and examples could easily be cited to confiim, that m the ex- 
tremely complicated subject of the behavioi of people very crude theory 
can play a useful role in certain contexts 

Second, many apparent exceptions to the theory can be so lemter- 
preted as not to be exceptions at all For example, a flier may be ob- 
served doing a stunt that risks his life, appaiently for nothing That 
seems to be m complete violation of the theory , but, if in addition it is 
known that the flier has a real and practical need to convince certain 
colleagues of his courage, then he is simply paying for advertising \\nth 
the risk ol his life, which is not in itself m contradiction to the theory 
Or,%suppose that it were known more or less objectively that the flier 
has a need to demonstrate his own courage to himself The theory 
would again be lescued, but this time peihaps not so convincingly as 
before In geneial, the reinteipretation needed to reconcile various 
sorts of behavioi with the utility theory is sometimes quite acceptable 
and sometimes so stiamed as to lay whoever pioposes it open to the 
charge of tiying to save the theory by lendeiing it tautological The 
same sort of thing arises m connection with many theories, and I thmlc 
thei e IS general agi cement that no hard-and-fast rule can be laid down 
as to when it becomes inappropriate to make the necessaiy leinteipre- 
tation For example, the law of the conservation of eneigy (or its 
atomic age variant, the law of the conservation of mass and energy) 
owes its success largely to its being an expression of remarkable and 
reliable facts of nature, but to some extent also to certain conventions 
by which new sorts of eneigy aie so defined as to keep the law true 
A stimulating discussion ot this delicate point in connection with the 
theory of utility is given by Samuelson in [Si] 

(g) Introspection about certain hypothetical decision situations sug- 
gests that the sure-thmg principle and, with it, the theory of utility 
are normatively iinsatisfactoiy Consider an example based on two de- 
cision situations each involving two gambles j 

Situation 1 Choose between 

Gamble 1 $500,000 with piobabihty 1; and 

Gamble 2 $2,500,000 with piobabihty 0 1, 

$500,000 with probability 0 89, 
status quo with probability 0.01. 

t This particulai (‘\ixmple is due to Allais fA2] Another interesting example was 
piesented somewhat eailiei by Oeoiges Morlat [C41 
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^Situation 2. Choose between 

Gamble 3 $500,000 with probability 0 11, 

status quo with piobability 0 89, and 

Gamble 4 $2,500,000 with probability 0 1, 

status quo with probability 0 9 

Many people prefer Gamble 1 to Gamble 2, because, speaking quali- 
tatively, they do not find the chance of winning a very laige foitune in 
place of receiving a large fortune outright adequate compensation for 
even a small risk of being left m the status quo Many of the same 
people prefer Gamble 4 to Gamble 3, because, speaking qualitatively, 
the chance of winning is nearly the same in both gambles, so the one 
with the much larger prize seems preferable. But the intuitively ac- 
ceptable pair of preferences. Gamble 1 prefeired to Gamble 2 and Gam- 
ble 4 to Gamble 3, is not compatible with the utility concept or, equiva- 
lently, the sure-thing principle Indeed that pair of preferences implies 
the following inequalities foi any hypothetical utility func^tion 

U ($500,000) > 0 117 ($2,500,000) + 0 89C/ ($500,000) + 0 117 ($0), 

( 3 ) 

0 lU ($2,500,000) + 0 9i7 ($0) > 0 1117 ($500,000) + 0 89?7 ($0), 

and these are obviously incompatible 

Examples f like the one cited do have a strong intuitive appeal , even 
if you do not personally feel a tendency to pretci Gamble 1 to Gamble 2 
and simultaneously Gamble 4 to Gamble 3, I think that a few trials 
with other prizes and piobabilities will piovide you with an example 
appropriate to yourself 

If, aftei thorough delibeiation, anyone maintains a pan of distinct 
preferences that are in conflict with the sure-thing pimciple, he must 
abandon, or modify, the principle, for that kind of disci epancy seems 
intolerable in a normative theory. Analogous cii cumstances forced 
D Bernoulli to abandon the theory of mathematical expectation for 
that of utility [BIO] In general, a peison who has tentatively accepted 
a normative theory must conscientiously study situations in which the 
theory seems to lead him astray, he must decide foi each by leflection 
— deduction will typically be of little lelevance — whether to retain his 
initial impression of the situation oi to accept the implications of the 
theory for it 

To illustrate, let me record my own reactions to the example with 

t Allais has announced (but not yet published) an cmpincal investigation of the 
responses of prudent, educated people to such examples [A21 
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which this heading was introduced When the two situations wer^ 
fiist piesented, I immediately expressed pieference foi Gamble 1 as 
opposed to Gamble 2 and for Gamble 4 as opposed to Gamble 3, and I 
still feel an intuitive atti action to those preferences But I have since 
accepted the following way of looking at the two situations, which 
amounts to lepeated use of the sure-thing principle 

One way in which Gambles 1-4 could be lealized is by a lottery with 
a bundled numbeied tickets and with prizes according to the schedule 
shown in Table 1 


Table 1 Prizes in units of $100,000 in a lottery realizing 
GAMBLES 1-4 


Ticket Number 


Situation 1 


Situation 2 



1 

2-11 

12-100 

(Gamble 1 

5 

5 

5 

(Gamble 2 

0 

25 

5 

1 Gamble 3 

5 

5 

0 

(Gamble 4 

0 

25 

0 


Now, if one of the tickets numbered from 12 through 100 is drawn, it 
will not matter, in either situation, which gamble I choose I therefore 
focus on the possibility that one of the tickets numbered from 1 through 
11 will be diawn, m which case Situations 1 and 2 are exactly parallel 
The subsidiary decision depends in both situations on whethei I would 
sell an outright gift of $500,000 for a 10-to-l chance to win $2,500,000 — 
a conclusion that I think has a claim to universality, or objectivity 
Finally, consulting my purely personal taste, I find that I would prefer 
the gift of $500,000 and, accordingly, that I prefer Gamble 1 to Gamble 
2 and (contiary to my initial reaction) Gamble 3 to Gamble 4 

It seems to me that in reversing my preference between Gambles 3 
and 4 I have coirected an error. Theie is, of course, an important sense 
in which piefeiences, being entirely subjective, cannot be in error, but 
m a diffeient, more subtle sense they can be Let me illustrate by a 
simple example containing no reference to uncertainty A man buying 
a car for $2,134 56 is tempted to order it with a radio installed, which 
will bung the total price to $2,228 41, feeling that the difference is 
trifling But, when he reflects that, if he already had the car, he cer- 
tainly would not spend $93 85 for a radio for it, he realizes that he has 
made an enoi 

One thing that should be mentioned before this chapter is closed is 
that the law of diminishing marginal utility plays no fundamental role 
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\n the von Neumann-Moigenstern theory of utility, viewed eithei em- 
piiically or noimatively, Theiefore the possibility is left open that 
utility as a function of wealth may not be concave, at least in some in- 
tervals of wealth Some economic-theoretical consequences ol lecog- 
nition of the possibility of non-concave segments ol the utility function 
have been woiked out by Fiiedman and myself [F121, and bv Fiicdraan 
alone [Fll] The woik ol Fnedman and myself on this point is ciiti- 
cizcd by Maikowitz [Ml] 



CHAPTER 6 




Observation 

1 Introduction 

With the construction of utility, the theory of decision in the face 
of uncertainty is, in a sense, complete I have no further postulates 
to propose, and those I have proposed have been shown to be equiva- 
lent 4io the assumption that the peison always decides in favor of an 
act the expected utility of which is as large as possible, supposing for 
simplicity that only a finite number of acts are open to him. At the 
level ot generality that has led to this conclusion theie seems to be 
little or nothing left to say To go further now means to go into moie 
detail, to investigate special types of decision problems One type of 
decision pioblem ot cential importance is that in \vhich the person is 
called upon to make an obsei vation and then to choose some act in the 
light ot the outcome of the obseivation 

The consideration ot such observational decision problems is a step 
toward those pioblems of great interest for statistics in which the pei- 
son must decide what observation to make, that is, of course, what to 
look at, not what to see They aie the pioblems of designing expeii- 
ments and othei observational programs 

Some remarks on observation weie made in Chapter 3, but only now 
that the theory oi utility is established is it possible to give a lelatively 
complete analysis of the concept 

Observation is a concept essential to the study of statistics proper, 
most of what has been said thus far being preliminary to, but not leally 
pait of, statistics, even aftei this chapter and the next one, on obser- 
vation, there will still lemain a majoi transition One important fea- 
ture of much of what is ordinarily called statistics is, according to 
my analysis, concerned with the behavior not of an isolated person, but 
of a gioup of persons acting, for example, in concert In later chapters 
I will deal, so far as I am able, with the pioblem of group action, but 
prehminaiy considerations bearing on it will be made and pointed out 
from time to time in this chapter and the next 

Though the details of these two chapters may seem mathematically 
forbidding, drastic simplifying assumptions are made in them to keep 
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Extraneous difficulties to a minimum These typically take the form 
of assuming that certain sets of acts, events, and values ol landom varia- 
bles are finite Even m elementary applications of the theoiy, these 
simplifying assumptions seldom actually hold In some contexts, it is 
quite elementary to lelax them sufficiently, m otheis, senous mathe- 
matical effort has been lequiied, and some are still at the fiontiei of 
research Relaxations of the assumptions will be touched on from time 
to time, sometimes explicitly but sometimes only implicitly m the choice 
of suggestive notation and nomenclatuie 

Beyond this introduction, the piesent chapter is divided into four 
sections § 2 analyzes informally and then formally the notion of a cost- 
free observation, §§ 3 and 4 discuss certain obvious but important con- 
ditiorjtS undei which one observation, and similarly one set of acts, is 
more valuable than another, § 5 abstractly discusses pioblems ot de- 
signing expel iments oi, perhaps more geneially, observational piograms 

2 What an observation is 

To begin with an infoimal survey of observation, consider a decision 
pioblem, that is, a person faced with a decision among several acts 
Calling it the basic decision problem and the acts associated with it 
the basic acts, a new decision problem would aiise, if the peisoii were 
informed befoie he made his decision that a particular event, say /i, 
obtained The new decision problem is i elated to the basic decision 
pioblem m a simple way, tor the acts associated with it arc also the 
basic acts, and the decision is to be made by computing the (‘xpected 
utility given B of the basic acts and deciding on one tliat maximizes 
the conditional expected utility The basic problem may be modified 
in still another, though closely related, way Let the person say m ad- 
vance, foi each possible which of the basic acts he will dcMnde on 
when he is informed, as he is to be, which element oi a given paiti- 
tion obtains This will be called the deiived decision pioblem arising 
from the basic decision problem and the obseivation of and its acts 
will be called deiived acts Technically speaking, the dcaivod acts aie 
determined by aibitiaiily assigning one basic act to each element of 
the partition. For any state 5, the consequence of a deiived act is the 
consequence tor s of the basic act associated with the particulai in 
which s lies. The terms informally introduced m this paragraph aie 
defined formally later m the section 

A deiived decision problem is not necessarily different in kind fiom 
the basic problem, indeed it is quite possible that the basic pioblem can 
itself be viewed as derived fiom some othei basic problem and obsei- 
vation. 
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Formidable though the description of a derived problem may seenT 
at first reading, its solution is, in a sense, easy and has already almost 
been given, for it is clear that, if P(B^) > 0, the person will decide to 
associate with a basic act the expected utility of which given B^ is 
as high as possible, and, if P(Bj) ~ 0, it is immateiial to the peison 
which basic act is associated with B^ 

It IS almost obvious that the value of a deiived problem cannot be 
less, and typically is gi eater, than the value of the basic problem from 
which it is derived Aftei all, any basic act is among the derived acts, 
so that any expected utility that can be attained by deciding on a basic 
act can be attained by deciding on the same basic act considered as a 
derived act In short, the person is free to ignore the observation 
That obvious fact is the theory’s expression of the commonplace that 
knojdedge is not disadvantageous 

It sometimes happens that a real person avoids finding something 
out 01 that his fi lends feel duty bound to keep something from him, 
saying that what he doesn’t know can’t hurt him, the jealous spouse 
and the hypochondiiac are familiar tragic examples. Such apparent 
exceptions to the principle that foiewarned is forearmed call for anal- 
ysis At first sight, one might be inclined to say that the person who 
refuses freely profleied infoimation is behaving iriationally and in vio- 
lation of the postulates But peihaps it is better to admit that informa- 
tion that scrnis fiee may piove expensive by doing psychological harm 
to its iccipioiit Considei, for example, a sick person who is certain 
that he has the best of medical care and is in a position to find out 
whether his sickness is mortal He may decide that his own personality 
is such that, though he can continue with some cheer to live in the 
fear that he may possibly die soon, what is left of his life would be 
agony, if he knew that death were imminent Under such circumstances, 
far from calling him irrational, we might extol the person’s rationality, 
if he abstained from the information On the other hand, such an in- 
teiprei^aiion may seem forced (Cf Criticism (f) of § 5 6 ) 

Examples of decisions based on observation are on every hand, but 
it will be woith while to examine one in some detail before undertaking 
an abstract mathematical analysis of such decisions Any example 
would have to be highly idealized for simplicity, because the complexity 
of any real decision problem defies complete explicit description, but 
particulai simplicity is m order here 
The person m the example is considering whether to buy some of the 
giapcs he sees in a grocery store and, if so, in what quantity To his 
taste, the grapes may be of any of three qualities, poor, fair, and excel- 
lent Call the qualities Q generically and 1, 2, and 3 individually From 
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nvhat the person knows at the moment, including of comse the appeai- 
ance of the giapes, he cannot be certain of then quality, but he attaches 
personal probability to each of the thiec possibilities accoiding to 
Table 1 

Table 1 P{Q) 

Q(iuilitv) 1 2 2) 

r(iobabihty) 1/1 1/2 1/1 


The person can decide to buy 0, 1, 2, or 3 pounds of grapes, these 
are the basic acts of the example Taking one consideiation with an- 
other, he finds the consequences of each act, measured in utiles, m 
each f)f the thiee possible events to be those given in the body of Table 
2 The expected utilities m the light margin of Table 2 follow, of 
couise, from Table 1 and the body of Table 2 

Table 2 Utilitv f(Q) for ea(jf{ f and each Q 


f 

Q 

1 2 2 

0 

0 

0 

0 

1 

- 1 

1 

O 

2 

— ») 

0 

5 

3 

-b 


(j 


The entries m Table 2 have not been chosen haphazardly, hut^ with 
an attempt at verisimilitude Thus it is supposed that, it the peison 
buys grapes of pool quality his dissatisfaction with the bargain will 
acceleiate rapidly with the amount bought, whicJi seems reasonable, 
especially if the keeping quality of pool giapes is low He is, of com sc, 
unaffected by the quality if he buys none Aga,in, buying a few lair 
grapes may be mildly desiiablc, but overbuying is not Finally, e\(‘el- 
lent giapes aic woith buying, even in laige (piantities, but the utility 
of the pui chase inci eases less than pioportionally to the amount bought 
The coirect solution of the basic decision pioblem is to buy 1 pound 
of giapes, for that act has, according to the right maigin of Table 2, 
an expected utility of 1, which is the largest that can be attained 
Now, suppose the person is fiee to make an observation, that is, a 
new obseivation m addition to those that may have contiibuted to the 
determination of the piobabilities m the basic problem It may be, for 
example, that the grocer invites him to eat a few of the grapes or that 
the peison is going to ask the woman beside him how i.hcy look to her. 
Let theie be five possible outcomes of Ins obseivation, call them x 
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generically and 1, 2, 3, 4, and 5 individually I assume, though this" 
feature is rather incidental to the exanaple, that low values of x tend 
to be suggestive of low quality The joint distribution of rc and Q, that 
IS, the probability that x and Q simultaneously have anj^ given pair of 
values, IS of cential technical importance Those probabilities, each 
multiplied by 128 for simplicity of piesentation, aie given in the body 
of Table 3 The iight-hand and bottom maigins of the table give, 

Ta-Ble 3 128P(a H Q) 




Q 



X 

1 

2 

3 

128PW 

1 

15 

5 

1 

21 

2 

10 

15 

2 

27 

3 

4 

24 

4 

32 

4 

2 

15 

10 

27 

5 

1 

5 

15 

21 


32 

64 

128P(Q) 

32 

128 


also multiplied by 128, the probability of each value of x and of each 
value of Q. The maigmal entries are, of course, obtained by adding 
lows and columns As indicated in the lower right-hand cornei of the 
table, the piobabilities assumed do indeed add up to 1, and the bottom 
margin lecapitulates Table 1 

Conditional piobabilities can easily be lead from Table 3 Thus, for 
example, the conditional probability that x is 2, given that Q is 3, is 
2/32, and the conditional probability that Q is 2, given that x is 4, is 
15/27 It will be seen in later sections that the distribution of a given 
Q IS, m a sense, even more fundamental than the joint distribution of 
X and Q 

There are 4^ = 1,024 derived acts, since one of the four basic acts 
can be assigned aibitiarily to each of the five possible outcomes of the 
observation It is an easy exercise, using Tables 2 and 3, to verify 
Table 4, which shows the conditional expectation of the utility of each 


T\ble 4 E(f I a;) 


f 

1 

2 

X 

3 

4 

5 

0 

0121 

0/27 

0/32 

0/27 

0/21 

1 

-7/21 

11/27 

S2/S2 

43/27 

49/21 

2 

-40/21 

-20/27 

8/32 

U/S7 

72/21 

3 

-94/21 

-78/21 

-48/32 

18/27 

7L/21 
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^asic act given each possible outcome of the observation For each x, 
the highest expected utility, given that value of x, has been italicized 
Thus, for example, only if x is 1 will the peison lefram from buying 
grapes altogether, and only if a; is 5 will he iisk buying 3 pounds In 
full, the best deiived act, call it g, is to buy 0, 1, 1, 2, oi 3 pounds, if x 
IS 1, 2, 3, 4, or 5, respectively The value of the derived problem is the 
expected value of g, which is computed thus 

(1) ^(g) = Z ^(g I 

X 

= (0 + 11 + 32 + 44 + 74)/128 
= 161/128 1 26 utiles 

ff 

Since the value of the basic pioblem is 1 utile, the envisaged obsc^iva- 
tion IS worth 0 26 utile , that is, the person would if necessary pay up 
to 0 26 utile for the observation 

Exercise 

1 Suppose that the person could diiectly obseive the quality of the 
giapes Show that his best derived act would then yield 2 iiiiles, and 
show that it could not possibly lead him to buy 2 pounds oi the gi apes 

The notion of a decision problem based on an obseivation will now be 
formally described, with special reference to mathematical notation and 
othei technical details 

1, There is a set of basic acts, F with elements f, f', etc 

In the example of the grapes F consisted ot the four envisaged acts 
of buying 0, 1, 2, or 3 pounds of grapes 

The convention laid down at the end of § 5 4, rcquiimg that the con- 
sequences of acts be measuied m utiles, will be adhered to, and it will 
be supposed that v(F) is finite 

2 The observation is a (not necessarily real) landom varia])le x 
associating with each state s an observed value i (s) m some set X of 
possible observed values a ', etc 

In the example of the giapes, the states s (of which the postulates 
require that there be an infinite number) were nevei fully desciibed, 
and consequently the random variable x was not fully desciibed cither 
In the same sense it may be said that the basic acts, which aie also 
really random variables, were not fully desciibed either All that is 
really important, however, is to know the simultaneous distribution oi 
the consequences of the acts in F and ot the values of x In the example 
of the grapes that information was implicit m Tables 2 and 3 
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For mathematical simplicity in the formal work to follow, it will ^ 
generally be assumed that X has only a finite number of elements, 
though the assumption can and must be relaxed in many practical 
situations When X is assumed finite, the landom variable x is, for 
all pui poses of the present context, simply a paitition of S, namely, 
the paitition into the sets on which x is constant Indeed, eailier m 
this section, the notion of observation was described in teims of a par- 
tition, but the description in terms of a random vaiiable is more familiar 
m statistics and may have technical advantages, especially when the 
restriction that X be finite is relaxed 

3 The set of strategy functions is the set of all functions associating 
an element of F with each element x oi X Let the values of the generic 
stiategy function be denoted by f(x) and the function itself by f(x) 

The notion of strategy function was not introduced in the informal 
desciiption of observation, nor in the example of the grapes, because 
it IS but a mathematical intermediary to the definition of derived acts 
and did not seem to call for explicit expression in the less formal con- 
texts 

4 To each strategy function f(x) corresponds a derived act g, in the 
set of all deiived acts F(x), defined by 

(2) g(s) = f(s, x{s)) for all s bS, 

It was explained that in the example of the grapes there are 4^ de- 
rived acts In the same way, it can be seen in geneial that if X has J 
and F has cj) elements theie aie derived acts 

5 The value of F given x, 

(3) 'i^(F 1 x) = Df sup E(f 1 x) 

f eF 

This IS the function of x indicated, for the example of the grapes, 
by italics in Table 4 

3 Multiple observations, and extensions of observations and of sets 
of acts 

If several random variables Xi, • , Xn, associating elements of S 
with elements of sets Xi, , Xn, are simultaneously under discussion, 
it IS natuial to form the new random variable, denoted x = {xi, • , 
Xn}, that associates with each element of S an ordered n-tuple of ele- 
ments of Xi, , Xn, lespectively If the context is such that Xi, • , 

x^ are thought of as obseivations, then x can also be thought of as an 
observation and will sometimes be called a multiple observation to 
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emphasize the maimer of its formation To illustiatC; any item such 
as profession or body tempeiatuie that might be entered on a patient's 
history can be thought of as an obseivation, but the whole histoiy, or 
a filing cabinet of histones, can also be thought of as an obseivation, 
the history being a multiple observation of items, and the cabinet a 
multiple obseivation of histones 

Consider two obseivations x and y It is an mteiesting possibility 
that X and y aie so i elated to each othei that knowledge of the value 
of X would (almost ceitamty) imply (almost ceitam) knowledge of y 
In that case, obseivation of x implies essentially the observation of y 
and generally something besides, which suggests the following three 
definitions 

If and only if x and y are observations such that, foi all 5 and s' m 
some B of piobabihty one, a:(8) = r(s') implies ij(s) = y(s') , then x is an 
extension of y, and y is a contraction of x If x is an extension of y, 
and y is an extension of x, then x and y aie equivalent. 

Stiictly speaking, one should say not that x and y me ecpiivalent, 
but rathei that they aie equivalent legaided as obseivations, foi this 
would not bo a good concept of equivalence tio apply to landom vana- 
bles regarded as such Foi example, a pan ol e(iuival('nii observations 
can obviously be a pan of leal random vaiiables with dif(ereni. expecjted 
values Some propeities of the lelations of exteiusion, (‘oni.uKd.ion, and 
equivalence between observations aie given by the iollowmg ea-sy but 
important exeicises Thioughout this set of exei cases it is iinnecessaiy 
to suppose the observations (‘onhned i.o a finite set of \ailu(‘s, m the case 
of Exercise 3b, it is impossible to do so. 

Exercises 

1 X and y aic equivalent, if and only if x is both an extension and a 
contraction of y 

2a If P{r(.s) = ?/(^0} = 1, X and y aio e(|uivalent 

2b Any obseivation x is equivalent f.o itscH 

3a If theie is a value ?/o snch that P\fj(s) — ijo] == I, (luai every 
X IS an extension of y, and any tvo such oliservations ai(‘ eciuivalent 
Such an obseivation, of couisc, amouni.s to oliserving nothing at all 
and will therefore be called a null observation. 

3b If n(s) =5 foi almost all 5 eS, then x extends every y 

4 If X IS an extension of y, and y is an extension of z, then x is an 
extension of z State and verify the analogous fact about equivalence 

5a If y' is a function associating an element of Y with each element 
of X, and x is an obseivation, then the observation y such that y = 
y'(x) IS a contraction of x 
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5b If y IS a contraction of x, then there is a function y' such that 
P{ 2 /(s) = = 1 What freedom is there in the choice of the 

function y'‘^ 

5c What are the implications of Exercises 5a and 5b for equivalence 
between observations 

6 If X and y are obseivations and z = {x, y} is the corresponding 
double observation, then z is an extension of x and of y (This exeicise 
seems to call for a converse saying that every extension can be regarded 
as a double observation, but no really neat one suggests itself to me 
None the less, m thmlang about extensions and contractions, the sort 
brought out by the exercise is a typical and stimulating example ) 

7 {x, y} IS equivalent to x, if and only if x extends y. 

The relations of extension, contraction, and equivalence have paral- 
lels for sets of acts, defined thus 

If F and G are (non-vacuous) sets of acts such that, for some B of 
probability one, there is for each g e G an f e F with /(s) = g{s) for all 
s e B, then F is an extension of G, and G is a contraction of F If F is 
an extension of G, and G is an extension of F, then F and G are equiv- 
alent. 

More exercises 

8 If F is an extension of (equivalent to) G, then v{¥) > ( = ) v(G) 

9 Discuss the analogues of Exercises 1, 2b, and 4 for sets of 
acts 

10 If F 3 G, then F extends G 

11 If F(x) IS derived fiom F on observation of x, then F(x) extends 
F 

12 Hyp 


F(x) IS deiived from F on observation of x, 

F(y) IS derived from F on observation of y, 

F(x, y) IS deiived from F on observation of {x, y} , 

F(x, y) IS derived from F(x) on observation of y 

CONCL 

1 F(x, y) IS equivalent to F(x, y) 

2 F(x, y) extends F(x) and F(y) 

3 If X IS equivalent to y, then F(x) is equivalent to F(y). 

4 If y extends x, then F(x, y) is equivalent to F(y), F(y) is equiva- 
lent to F(x, y), and F(y) extends F(x). 
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13a Undei the hypothesis of 12, the equivalences and relations of 
extension among the sets of acts arising out of two observations can, 
with evident conventions, be diagrammed thus 



x,y 

X, y 

y, X 

r 

1 

X 


i 

— y 


13b. If y extends x, the diagram becomes 


0 

iSc If X and y are equivalent, the diagiam becomes 



14 If F(x) and G(x) aie deiived from F and G, i espcctively, and if 
F extends G, then F(x) extends G(x) 

15 t;(F(x)) = E[v(E I x)] = J y(F I ^(s)) dP{s) > v(F). 

4: Dominance and admissibility 

According to Exercise 3 14, it one set ot acts, regarded as basic, ex- 
tends another, the hist is at least as valuable as the second in the light 
of any obseivation whatcvci This section exploies a i elation, domi- 
nance, which has the same pioperty but is not so stiict as extension 
Dominance is of some impoitance for the theoiy ot peisonal probability 
as it has been developed thus fai But its impoitance will be even 
greater in the study of statistics piopei, where intei peisonal agieement 
IS of paiticulai interest, for, as the definition shortly to be given will 
make clear, two people having different personal piobabilities will agree 
as to whether one of two sets ot acts dominates another, if only they 
agree which events have probability zero — a condition generally met 
in practice, and one that could if desired be dispensed with by a slight 
change m the definition of dominance 

It will be seen that dominance and notions related to it are intimately 
associated with the suie-thing pimciple Indeed, piobability being 
taken for granted, the basic facts about dominance seem to give a com- 
plete expression of the suie-thmg pimciple Dominance and related 
concepts were much stressed by Wald, in [W3] for example 
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Two or three notions, the logical connections among them, and those 
between them and extension, are to be treated The logical connec- 
tions being many but simple, I think that the material lends itself bet- 
ter to formal than to expository treatment, for in such a context the 
reader who looks for the motivating ideas sees them himself more easily 
than he comprehends someone else^s verbalization of them This sec- 
tion will therefore consist primarily of a group of formal definitions md 
several exercises 

If and only if P(/(s) > ^(s)) = 1, f dominates g. If and only if some 
(every) element of F dominates (is dominated by) g, F dominates (is 
dominated by) g If and only if F dominates every element of G, 
F dominates G. If and only if f dominates g, but g does not donJinate 
f , f Strictly dominates g. If and only if f e F, and f is not strictly domi- 
nated by any element of F, f is admissible (with respect to F) 

Involving as they do acts as well as sets of acts, the definitions, 
strictly speaking, introduce four different kinds of dominance How- 
ever, this complexity can be alleviated, with a slight lapse of logic, by 
identifying each act f with the set of acts of which f is the only element, 
for it IS easily seen that this identification is in such harmony with the 
definition that, once it is made, the four kinds of dominance collapse 
into one 

Exercises 

la Consider analogues of Exeicises 3 2b and 3 4 

lb When can two acts dominate each other‘s 

2a If F extends G, then F dominates G Discuss the converse. 

2b F(x) dominates F 

2c If F 3 G, then F dominates G 

3a If F C G, and F dominates G, then all admissible elements of G 
are contained in F 

3b After any finite number of non-admissible elements is deleted 
from F, what remains of any subset of F that dominated F contmues to 
dominate F 

3c Though the set of admissible elements of F may m some instances 
dominate F, no proper subset of the set of admissible elements can ever 
do so, but, if any other subset dominates F, some proper subset of it 
also does so 

3d. If F IS finite, the set of admissible elements of F dominates F 

3e. Discuss the role of ^'finite’' in 3b and 3d 
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4a If the set of admissible elements of F dominates G, and G domi- 
nates F, then the set of admissible elements of F is equivalent to the 
set of admissible elements of G 

4b If F and G dominate each other, and either is finite, then the 
sets of admissible elements of F and G, respectively, are equivalent to 
each other, and each dominates both F and G 
^5 If F dominates G, then v(F) > v{G). 

6 If F dominates G, then, for any observation x, F(x) dominates 
G(x) 

6 Outline of the design of experiments 

Often, especially in statistics, a decision problem can be seen as the 
problem of deciding which of several experiments — or which of several 
observational programs, if that is really a moie geneial term — to under- 
take 

In this section the notion of the decision problem derived from a 
basic decision problem and an observation must be elaborated a little, 
because, as derived acts have been treated thus fai, they correspond to 
the possibility of making an observation free of charge Though obser- 
vations are sometimes free, there is typically a cost associated with 
making them, information must typically be bought either from othei 
people 01 , more often fiom nature, so to speak The cost of informa- 
tion may be money, trouble, one^s own life, that of another, or any of 
innumeiable possibilities, but all can in pnnciple be mcasuicd in teims 
of utility The cost of an observation in utility may be negative as 
well as zero or positive, witness the cook that tastes the broth 

In principle, if a number of experiments are available to a poison, he 
has but to choose one whose set of derived acts has the greatest value 
to him, due account being taken of the cost of observation. That simple 
formulation, like some others m this book, is, m a sense, oversimple, it 
abstracts from the enormous variety of considei ations that enter into 
the careful design of any experiment The possibility of so absti acting 
from variety does not remove the ultimate necessity ot studying some 
aspects of that variety in detail R. A Fishei^s The Design of Expen- 
ments [F4], for example, is concerned almost exclusively with experiments 
based on a special technique called the analysis of vaiiance, and it is 
but an introduction to even that important facet of statistics Again, 
there is a growing literature (in which the work of A. Wald is outstand- 
ing) on sequential analysis, which is concerned in principle with all ex- 
periments in which later parts of the experiment are conducted in the 
light of what happens m earlier parts, but this literature has, by neces- 
sity, been confined to a relatively tiny part of that domain 
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Before turning to a more formal recapitulation of the outlme of the 
design of experiments, this may be a good place for a few speculative 
words about the difference, if any, between experiment and observation 

Some sciences are commonly called expeiimental as opposed to others 
that are called observational. Aerodynamics, the psychology of rote 
learning, and the genetics of fruit flies would typically be called experi- 
mental sciences, and, to take parallel examples, meteorology, the psy- 
chology of dreams, and human genetics would be called observational 
But it IS widely agreed, and the most casual consideration makes it 
clear, that any basic difference that may really be present resides not 
in the sciences themselves but in the methods typical of each To illus- 
trate the role of observation in sciences ordmarily considered experi- 
mental and vice versa, observations of wild populations of fruit-flies 
hav^been useful in the study of the genetics of fruit flies, the effects of 
fatigue, for example, on dream content may well be the subject of an 
experiment, and, except for the atom, no topic in science is more popu- 
lai today than experimental ram making The illustrations could be 
extended indefinitely, and there is also a less direct sort exemplified by 
the discipline called expeiimental medicine, which typically studies ex- 
pel iments on animals with the hope, often justified, that the findings 
thus obtained can be extrapolated to humans 

The pioblem, then, is to distinguish an expeiiment from an obseiva- 
tion Except foi brevity, it might be better to say mere observation, 
foi, in gcneial usage, an experiment would be considered a special sort 
of observation 

The first apparent contrast that comes to mind is that experimenta- 
tion IS geneially thought of as active and observation as passive But, 
upon examination, it is seen that observation is also active, for obser- 
vations aie typically made by going somewhere to observe, or waiting 
attentively till something happens Often it is not only the observer 
himself who must be transported and put in readiness to make an ob- 
servation, but also a considerable body of apparatus What demands 
moic activity than the modern observation of a solar eclipse‘s’ 

Another apparent contrast is that the experimenter acts on the thing 
he observes, whereas the obseiver acts only on himself and on instru- 
ments of observation that may be regarded as extensions of his own 
sense organs If this criterion were accepted altogether naively, there 
would be no such thing as a physiological experiment on one’s self, 
even sophisticated interpretations might find it difficult to embrace 
psychological experiments on one’s self 

Finally, experiments as opposed to observations are commonly sup- 
posed to be characterized by reproducibility and repeatability. But 
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^ the observation of the angle between two stars is easily repeatable and 
with highly reproducible results in double contrast to an experiment to 
determine the effect of exploding an atomic bomb near a battleship 
All in allj however useful the distinction between observation and ex- 
periment may be in ordinary piactice, I do not yet see that it admits 
of any solid analysis At any late, no formal use of the distinction will 
be attempted m this book 

"^Return now to the notion of observation subject to cost It may be 
that the value of the random vaiiable x is observable but only at a 
cost c, a real-valued random variable measured in utiles If, as hereto- 
fore, F(x) denotes the set of acts derived from F on cost-free observa- 
tion of X, let F(x) — c denote the set of deiived acts subject to the ran- 
dom^cost c This notation is interpreted to mean that, if f is the generic 
element of F(x), then f — c (which, being a utility-valued function of 
s, IS an act) is the generic act oi the set F(x) — c Very often the cost 
of an obseivation is independent of s, but not, for example, for him that 
tests the shaipness of a thorn with his finger Since observations aie 
typically paid for befoi e, or simultaneously with, making the observa- 
tion, the cost is typically observed along with the observation propei 
Put differently, the cost c is typically a contraction of the obseivation 
X Thus, if in some special context any advantage were to be gained 
by so doing, it would not be drastic to assume the cost of observing x 
to be a function of the form c'(x), but, as a matter of fact, no such ad- 
vantage has come to my attention It is not difficult to think of ex- 
periments to which the assumption does not apply For example, in 
the present state of uncertainty about the long-term effects of x-rays, 
anyone conducting a short-teim experiment in which young human be- 
ings weie subjected to large doses of x-radiation would risk costs that 
might not oveitly manifest themselves for half a century, oi even for 
generations 

Much that would ordinal ily be called observation cannot be dcsciibed 
by saying that the landom cost is simply to be subtracted tiom each de- 
rived act of the coiresponding obseivation thought of as free of cost 
Allowing that it may be legendary, the foim of trial by ordeal in which 
the guilty floated safely to be hanged and the innocent drowned to be 
exonerated epitomizes such a situation, except in point of absurdity, 
ordinary industiial destructive testing of electric fuses and other prod- 
ucts IS much the same Strictly speaking, discrepancy occurs even in 
the ordinary context m which the cost of observation is a fixed sum of 
money, for the utility of money is not strictly linear, so the cost of ob- 
servation typically affects different derived acts somewhat differently 
This sort of situation is indeed so common as to mtioduce at least a 
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slight error into almost every application of the notion of cost as a sub- 
tractive teim It would therefore be desirable to extend considerably 
the notion of cost of observation, but, thus far, I see no way to do so 
that does not destioy the mathematical advantage of smglmg problems 
of obseivation out of the class of decision problems generally. 

It IS convenient now to analyze the appropriateness of regarding the 
number z;(F) as a measure of the value of F. As must already be cle^r 
to the reader, if a person is to make a preliminary decision limiting his 
next decision to one or another of several sets of acts, say, F, G, and H, 
then his piehmmary decision will select a set that has the highest value 
of V, and the preliminary and secondary decisions, regarded as a single 
grand decision, amount to the problem of deciding on an act from 
F U G U H So far as this use of v is concerned, any increasing mono- 
tonia function of v such as or would be equally satisfactory, but v 
has an advantage m arithmetic simplicity when costs of observation 
are involved Consider, for example, the problem of whether to make 
a particular observation at the random cost c or to make no observation 
at all. The two sets of acts involved may then be symbolized by 
(F(x) — c) and F, lespectively The peculiar simplicity of y as a meas- 
ure of the value of a set of acts, in this context, is exhibited by the almost 
obvious fact that z;(F(x) — c) = ^^(F(x)) — E(c) It may be remarked 
in passing that v is sl particularly good measure in any problem where 
F, G, or H IS, so to speak, made available by lot, a possibility realized 
in (7 3 2), for example 

Finally, if one among several observations is to be chosen, each with 
its own random cost (possibly including the null observation), the per- 
son will choose an observation for which v(F{x)) — E{c) is as large as 
possible If the number of observations among which decision is to 
be made is infinite, that function may not attain a maximum value, 
but the value of the situation to the person can reasonably be regarded 
as the supremum of the function, there are, of course, observations 
among those available for which the supremum is arbitrarily nearly 
attained 
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Partition Problems 

1 Introduction 

In the introduction of the preceding chapter it was explained that 
the treatment of decision problems in general had been carried to a 
logical conclusion, and that to study decision problems further lt<^had 
become necessary to specialize The notion of observation was accord- 
ingly chosen as the subject of specialization The situation now re- 
peats itself at a new level, for I have now covered the mam points that 
occur to mo about observation in general, though I see considerably 
more to say about a certain type ot obseivation 
The type of observation problem to which the present chapter is de- 
voted, though 1 datively special, is still very geneial Indeed, its gen- 
erality IS suggested by the fact that no other type of problem is syste- 
matically treated in modern statistics In objectivistic terms, it would 
be desciibed as the type of decision problem m which the consequence 
of each basic act depends only on which of several (possibly inhnitely 
many) probability distributions does m tact apply to the random vari- 
able to be observed 

Modern statistics has no name for this type of problem, because it 
recognizes no other type, and no particulaily suggestive name occurs 
to me I am therefore tentatively adopting the noncommital name 
'^partition problem Such motivation as theie is for that name will 
be appal ent when the concept is defined 
In non-objectivistic teims, a partition problem has the following 
structure There aie, of course, basic acts F and an obseivation x 
The peculiar feature is a random variable b, which is typically not sub- 
ject to obseivation, with the property that every f in F is constant 
given that b has any particular value b 

In many practical problems b takes on an infinity, even a non-de- 
numerable infinity, of values, but systematic consideration of such 
problems would involve those advanced mathematical techniques that 
are explicitly being avoided in this book Glossing over such questions 
of technique for the moment, the state of the world, which is itself a 
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random variable, might play the role of b; with respect to this b, any " 
observational decision problem would presumably be a partition prob- 
lem It may, therefore, be inaccurate to call partition problems special, 
but they are special whenever b is not equivalent to the state of the 
world 

As has just been mentioned, the general policy of this book with re- 
spect to mathematical technique restricts formal treatment of partition 
pioblems here to those in which b assumes only a finite number of Uii- 
ferent values, that is to say, those in which b is to all intents and pur- 
poses a partition whence the name ''partition problem For the 
reader who is not familiar with the elements of the geometry of n-dimen- 
sional convex bodies, there will be a distinct expository advantage in 
confining the formal treatment still furthei to twofold partitions. At 
the^same time, by explicit statements and by the use of suggestive no- 
tation, all readers will be given at least some idea of the extension of 
the theory to n-fold partitions, indeed, a reader familiar, for example, 
with Sections 16 1-2 of [V4], or with [B20] wAl find the extension as 
plain as if it had been made explicitly Thus the restriction to twofold 
as opposed to n-fold partitions will be to the advantage of some and to 
the disadvantage of none 

Partition problems are even closer than are obseivational problems 
generally to the subject matter of statistics proper In particular, in 
the course of this chapter, multipersonal considerations will from time 
to time be pointed out in connection with partition problems. 

2 Structure of (twofold) partition problems 

A central featuie of a twofold partition problem is, of course, a two- 
fold partition, or dichotomy, JS^, ^ = 1, 2 By way of abbreviation let 
/3(^) = and /3 = /5(2) } The PitYs can be any two numbers 

such that I3(t) > 0 and '^I3(t) — /3(1) -f- = 1 Since ^(2) = 1 — 

iS(l), it might seem superfluous to have a special notation for /5(2), but 
this redundancy more than pays for itself in symmetry, especially in 
the extension of the theory to n-fold partitions The possibility that 
one of the j8(i)’s vanishes has been ruled out, for it is neither typical nor 
interesting, and its retention would mar the exposition of the theory 

Each basic act f e F is characterized by a pair of numbers f^ such that 

(1) P(/(«) =/j5,) = l 

for each t The technical assumption wall be made that as f ranges 
over F the numbers are bounded from above for each i, which is a 
little more stringent than the now familiar assumption that v(F) < oo 
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The assumption expiessed by (1) is made for definiteness and sim- 
plicity, though its full foice will seldom be used The possibility of le- 
laxing (1) in certain contexts will be mentioned from time to time, es- 
pecially since this possibility is of some interest even in the exploitation 
of (1) itself In paiticular, foi several pages now it will scarcely ever 
be necessaiy to assume anything about the structuie of F relative to 
except that E{i ] B^) is bounded from above for each z, toi making 
the abbieviation f^ = E{i \ B^), almost everything from heie through 
Exercise 1 applies veibatim 

The expected utility of any f s F can be computed in several forms 
thus 

(2) , Eif) = B(f 1 + E(f 1 B2)P(B2) 

= /i^(1)+/2/3(2) 

= S/./?(t) 

= /2 + (fl ~ /2)/5(1)- 

The first of these forms expresses the expected value in general terms, 
the second utilizes abbieviations, the thud is an obvious mathematical 
transciiption of the second, paiticulaily suggestive of extension to the 
n-fold situation; the fourth sacrifices the symmeti^^- exhibited by the 
preceding three in older to take advantage of the i elation between 
jS(l) and /3(2) Fiom the fourth foim of (2), it is clear that, foi fixed f, 
E(f) IS a linear function of /5(1) liencefoith that fact, for example, 
would be expressed in symmetiic form by saying that E{f) is linear m 
and the dependence of ^(f) on ^ might be explicitly indicated by 
writing E(f\ 13) 

Since m any one decision pioblcm is constant, it might seem point- 
less to emphasize that E(f\ 13) is lineai in 13 But there arc, in fact, two 
different reasons foi being interested in vaiiation of /?. In the first place, 
once the observation x has been obseived to have the value a;, the basic, 
or a pnoii, decision pioblcm is leplaced by an a posteriori problem in 
which P(B^ 1 x) plays the lole oiigmally played by P(B^) = /5(z) Sec- 
ond, interest in compaimg diffeient people is becoming increasingly 
more explicit as the book proceeds In paiticulai, it is of interest to 
compare people who have available the same set of basic acts and who, 
at least so far as the distiibution of x and the acts m F are concerned, 
have the same conditional personal probability given B^, but who at- 
tach different probabilities /?(?.) to the elements of the partition 
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To emphasize its dependence on 13, v(F) will sometimes be written 
1 /3) , its computation m the following fashion is fundamental to 
the theoiy of partition problems 

(3) v(¥\l3) = sup£/(fl/3) 

f eF 

= sup [/i^(l) +/2/3(2)] 

f e F ^ 

= m, 

where fc(/3) is defined by the equation in which it occurs According to 
Exercise 4 of Appendix 2, the function k is convex m p, that is, k is 
convex when recognized as a function of /3(1) alone Interpreted as a 
pair of a prion piobabilities, is confined to the open interval defined 
by«S/3(j) = 1, /3(^) > 0, but it is valuable to recognize that k is defined, 
convex, and continuous on the closed interval = 1, /3(^) > 0 
Many typical featuies of the lelationship between F and are illus- 
trated graphically by Figure 1 The abscissa of that graph represents 



Figure 1 


both /3(1) and /5(2), as indicated, and the ordinate is measured m utiles 
The straight lines, the left ends of which are marked a, 6, c, d, and e, 
graph as functions of /3 the expected values of the five basic acts of the 
particular problem represented The ordinates at their right and left 
ends, respectively, are the corresponding values of the fiB and / 2 ’s 
The graph of k is marked by heavy line segments It is seen that the 
lines a, c, and e, and they alone, touch the graph of k, for they repre- 
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sent the only acts that are optimal foi some value of jS The act repre- 
sented by d is inadmissible (if (1) is taken liteially), being in fact strictly 
dominated by eveiy other act except c, and it is theiefore supeifluous 
to the person, no matter what the value of 6 is obviously ecjually 
superfluous, but for a different reason 

In many typical problems m which F has an infinity of elements, k 
is,^ unlike the k in Figure 1, strictly convex, that is, its only intervals 
of linearity are point intervals 

Exercise 

1 Compute and graph k for the set F of dichotomous acts of the 
form 

/l(0) = 1 — (1 + 

-2 < 0 < +2 

/ 2 («) = 1 - (1 - <^>)^ 

Amwer k{p) = [^(1) - 13(2)^ = [2^(1) - if 

Turn now to the iclations between an obseivation x and the dichotomy 
As before, it will be assumed for mathematical simplicity that the 
values of X ate confined to a finite set X The probability that x at- 
tains the value i given 5,, written P(x j 5^), is fundamental m connec- 
tion with paitition problems For one thing, as has ahvady been indi- 
cated, theie IS interest in consideimg people wlio, though diifeimg with 
respect to 13, agiee with respect to P(x | B^) The piobability P{\, B^) 
that X attains the value x and that Bi simultaneously obtains, tlie pioba- 
bility P{a) that x attains the value x, and the probability I3{i \ 0 oi 
given that r(6) = r are derived from P(x j B^) and 13 by means of Bayes’ 
rule (3 5 4) and the partition rule (3 5 3) thus 


(4) 

P(c, B,) =P(i\ BMi) 

(5) 

1 

(6) 

1 ;r) = PC, B,)/P(x), 


if P{x) 9 ^ 0; and I3(t ] x) is meaningless otherwise It must be remem- 
bered that P(x, BJ, P{x)j and I3(i ] x) depend on the value of (3 and that 
a really complete notation would show that dependence On the other 
hand, the condition that P{x) 0 is independent of the value of ^ 
When a second obseivation y is to be discussed, | y) is, m defiance 
of strict logic, to be undcistood as the analogue of I3(t ] r), that is, as 
the conditional probability ol B* given that y(s) = y, not as the same 
function as I3(t ] x) with y substituted for 7 Conespondmg conven- 



THE VALUE OF OBSERVATION 


125 


7 3] 

tions apply to P{y)^ P(y | and P(y, B^) Finally, free use will be 
made of such contractions as I3(x) for {j3{l ] x), /3(2 j x)} 

Equation (1) implies that 

(7) E(f 1 B,, x) = E(f 1 RO 

for all f s F and for all x such that P(x | B^) > 0 Equation (7) is the 
mathematical essence of the concept of a partition problem, and vir- 
tually all that IS to be said about partition problems applies verbatim, 
if (7), even without (1), applies to such observations as may be under 
discussion 
In view of (7), 

(8) E(i 1 ;8, x) = X Eif I x)P{B, I :r) 

% ^ 

if P(x) > 0. 

3 The value of observation 

If the observation x is made, and it is found that x(s) = x, then the 
a posteriori value of the set of basic acts, written 2 ;(F | x), or more fully 
y(F 1 /3, x)j will typically be different from the a prioii value v{f\ 13) 
Indeed, in view of (2 8), 

(1) u(F 1^, x) = supJS(f 1/3, a;) 

f eF 

= v(F\^{x)) 

= fc(/3(a:)). 

This IS the first illustration of the technical convenience of the function k 
It IS known on general principles that f(F(x)) > z;(F), but there is 
some mteiest in reveiifymg the inequality in the present context, in 
particular, it is possible here to say in interesting terms ]ust when equal- 
ity can obtain 

(2) KFW 1 = E{v{F 1 /3(x)) 1 ff) 

= EWix)) 1 fi) 

> k{E{p{x) 1 ^)), 

where the terminal inequality is an application of Theorem 1 of Appen- 
dix 2 To appreciate the inequality (2), it is necessary to calculate 
E(P(i,\x)) explicitly This calculation, typical of many the reader must 
henceforth be expected to make for himself, runs as follows, where it is 
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to be understood that the summation with respect to x applies only 
to those terms for which P{x) is different from 0 


( 3 ) 


jE7(/3(* I x) I /3) = 23 I x)F{x) 


= P(B,) = m 

Substituting (3) into (2) leads to the anticipated conclusion that 


(4) ^ 


^;(F(x) 1 > m = KF 1 /3) 


According to Theorem 1 of Appendix 2, z;(F(x) | is definitely greater 
than y(F I /3) unless /3(x) is confined with probability one to some inter- 
val of linearity of k, in which case the observation x may fairly be 
called irrelevant to the basic decision pioblem at hand. If x is irrelev- 
ant, the interval of linearity to which jS(x) is confined must, in view of 
(3), contain /5 In the particularly interesting case — and the only pos- 
sible one, if is stiictly convex — in which /3(x) is with probability 
one equal to a constant value, that value must therefore be P An ob- 
servation for which /5(x) is with probability one equal to (3 may faiily 
be called utterly irrelevant, because it is irrelevant no mattei what set 
F of basic acts is associated with the dichotomy 
To say that x is utterly iriclevant is to say that, with probability 
one, 

(5) 


P(x) 


= m- 


Since i8(^) > 0, (5) is equivalent to the condition that 
(6) P{x 1 B,) = P{x), 

at least when P{x) > 0 Furtheimore, it is obvious fiom (2 5), again 
noting that /3{t) > 0, that, if P(x) = 0, then P(x | B^) = 0 Therefore 
X is utterly irrelevant, if and only if (6) holds for all x and z, that is, if 
and only if the distribution of x given is independent of i This form 
of the condition is intuitively evoked by the words ^'utterly irielevanF' 
and has the advantage of not involving 
It IS noteworthy that whether an observation is utterly ii relevant 
depends neither on the particular set of basic acts, nor on the value of 
/3, so people will agree on what is utterly iiielcvant independent of their 
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personal a priori probabilities and the acts among which they are free 
to choose. 

The greatest lower bound in x of v(F(x) [ /3), namely v{¥ ] /3), and the 
circumstances under which this bound is attained having been estab- 
lished, it is natural to turn to a parallel investigation of the least upper 
bound A foothold for that investigation is found in the remark that 
the chord joining the ends of the graph of k never lies below the graph 
Analytically, 

( 7 ) m < 0 ) + mm 1 ) = m, 

where Z(/3) is defined by the context Unless one of the /3(z)’s vanishes, 
equality holds in (7), if and only if k(^) is a linear function In view of 

(7) and (3), 

(8) . !;(F(x) 1 = EQcifiix)) ] 0) < ] /3) = 

The inequality (8) gives an upper bound for i;(F(x)) In graphical 
terms it says that, for any jS, no observation can add more to the value 
k(^) of F than the vertical distance at between the graph of k and 
the graph of the chord joining the ends of k 

Equality obtains in (8), if k is linear, in which case the upper and 
lower bounds aie equal to each other irrespective of the value of (3 and 
the nature of the observation If F is dominated by a single f, that is, 
if theie is a single f optimal given foi both values of z, then k is Imear 
It can easily be verified that, provided F is finite and (1) actually ob- 
tains, this IS indeed the only circumstance under which k is linear, and, 
even if these provisions are not satisfied, the possibilities are not much 
more interesting 

Suppose, then, that k is not linear, equality can hold in (8), if and 
only if /3(x) is with probability confined to the ends of the interval, a 
condition that does not depend at all on F By simple considerations, 
which have by now been rendered familiar, this condition on x is equiv- 
alent to the condition that 

(9) P(x 1 B,)P(x I Sa) = 0, 

for all X An observation satisfying (9) may fairly be called definitive, 
because, if (1) obtains, such an observation removes all uncertainty 
about the outcome of each f s F, no matter what /S may be 

Perhaps many of the observations made in everyday life are defini- 
tive, or practically so Once Old Mother Hubbard looked in the cup- 
board, her doubts were reduced to the vanishing point. None the less, 
definitive observations do not play an important part m statistical 
theory, precisely because statistics is mainly concerned with uncer- 
tainty, and there is no uncertamty once an observation definitive for 
the context at hand has been made 
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4 Extension of observations, and sufficient statistics 

It was shown in § 6 4 that a statistic, or contraction, y of an obser- 
vation X IS never worth more than x and is typically worth less The 
purpose of the present section is to exploie the relation between an ob- 
servation and a conti action of itself in the case of a partition problem, 
especially to explore the special conditions in that case undei which the 
sta:tistic IS as valuable as the observation itself 
Let X and y be two observations such that y is a statistic of x, that 
is, such that, for some function y', y(s) = y'(x(s)) with piobability one 
The values of F(x) and F(y) can be compared by the following calcula- 
tion, which m the light of the preceding section will need but little ex- 
planation. 

(1) KF(x)) = Emm) I 

= j:Eikm)\^,v)Piy) 

u 

(2) E{hm) 1 iS, V) > KEm)) i /3, ?y)), 

if P(2/) > 0 

(3) 7?(/3(^ I x) I /3, y) = /3(^ | \ ij) 

= '^' )Pi->, IJ) 

. P{V) 

if P(y) > 0 

Because of the special relationship between x and y, P{c, y) = 0 un- 
less y'{x) — y, in which case P{x, y) = P{x) Understanding that the 
summation indicated by S' in (4) below extends only over those values 
of X for which ^/{x) = y, the calculation is continued thus 


(-t) 


E{p{i I x) I y) 


^^ Pjx, B,) Pii) 
P{r) P(y) 

P(y) 


_ Pjy, B,) 
P{y) 

= ^(? I y) 


Therefore, 

(5) 


?;(F(x) 1 /3) > X3 Wiy))P{y) = «j(F(y) | /3). 
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After the preceding section, it seems almost superfluous to explain 
that the point of the calculation above is not to obtam the inequality 
(5), which has already been derived with less labor and greater gener- 
ality in Exercises 6 3 8 and 6 3 13b, but to be able to discuss when equal- 
ity holds in (5). The calculation makes it clear that equality holds in 

(5) , if and only if equality holds in (2) for every y of positive probability 
This in turn is equivalent to the condition that, given y, /3(x) is confined 
with probability one to an inteival of linearity of k A sufficient con- 
dition for that IS that, given y, IS{x) be confined with probability one to 
a single value, which cannot be other than ^(y) , if k is strictly convex, 
the almost certain confinement of /3(x) to 0(y) is also necessary Now, 
if, for every y of positive probability, P{^{x{s)) = fi{y) ] 2/) == 1, then 
it is true that ^{x) = ^{y) with unconditional probability one, th^t is, 

( 6 ) " Pirns)) -Kym = 1 

The condition (6) clearly does not depend on F, and the following 
calculation so expresses it as to make clear that it does not depend on /3 
either Equation (6) is satisfied, if and only if 

Pjx I BM^) ^ P(y'(x) I BMi) 

P{x) P{y'{x)) 

when P(x) > 0, or, if and only if 

P(x\B,) ^ PW 

^ P(j/lP*) Piy)’ 

when P(x | P,) > Oj oi’i again, if and only if 
(9) Pix 1 y) = P{x 1 y), 

when P(y\ Bi) >0, or finally if and only if P(x j B^, y) is independent 
of I for those values of % for which it is defined In this form, and yet 
another to be derived in connection with (10), the condition is widely 
studied m modern statistical theory and a statistic satisfying the con- 
dition IS there called a sufficient statistic. The name is well justified, 
for, as has just been shown, it is sufficient, for any purpose to which x 
might be put, to know y, if and only if y is a sufficient statistic for x 

A different, and perhaps more congenial, approach to sufficient sta- 
tistics IS the following If the person observes the particular value y 
of y, his original basic decision problem is replaced by a new one with 
the same basic acts, but with ^ replaced by Piy) Strictly speaking, 
this will fail to be a partition problem, in case Piy) is (0, 1) or (1, 0), or, 
for brevity, if Piy) is extreme To see whether ^;(F(x) ] p) is really greater 
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than v(F(y) j /3), it is enough to investigate whether, for some y of posi- 
tive probability for which fi{y) is not extreme, x is lelevant to the par- 
tition problem based on /S(y), for if ^{y) is extreme there can be no value 
in following the observation that y has occuned by the observation of 
X. Theiefoie, x will be a woithless addition to y, if, for every y for 
which ^{y) IS not extieme, x is utteily irrelevant, that is, if y is sufficient 
for X. If k IS strictly convex, the condition is also nccessaiy 
"The recognition of sufficient statistics in explicit problems is often 
facilitated by the following factorability criterion. A statistic y is suffi- 
cient for X if and only if there exists at least one pair of functions R and 
S such that 

(10) P{x\B,)^R{y'(x),i)S{x) 

r 

The necessity of the condition follows from the exhibition of a parMcu- 
lar R and S for a sufficient statistic thus 


(11) P{x\B,) =^j:P{x\B^,y)P{y\Bd 

?/ 

= Z P(T I y)P{v I B,) 
y 

= Piy'ix) I B;)Pix I 2 /'(a)) 


On the other hand, if P(^ j B^) can be expressed m the foim (10), y 
can be seen to be sufficient for x thus If P{x ] y) is meaningful, it 


IS given by 

112) 


i y) 


P{x, V \ B,) 


P{y 


0, 


P(x 1 


P(y 

Bd’ 

S(x) 

L 

S(x') 


if 2 /' (a) 7^ y, 
if y'{x) = y, 




which IS independent of ^ The reader may be interested in asking 
himself, as an exercise, what freedom there is m choosing R and S when 
at least one such pair of factors exists 
Interest in sufficient statistics is not confined, of course, to twofold, 
or even finite, partitions With that in mind, the various criteria for 
sufficient statistics have been given in such terms as to be valid for any 
finite partition and the usual infinite ones They require some modifica- 
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tion if the observations are not confined to a finite, or at any rate de- ^ 
numerable, set of values, but formal details of that important extension 
will not be given here Elementary treatments are given in most text- 
books of mathematical statistics, more advanced and general treat- 
ments are given in [B2], [L6], and [H3] 

There are several examples of sufficient statistics in the exercises 
below, others are given in almost any fairly advanced textbook on sta- 
tistics (m particular, in [C9]), and one other general example of extraor- 
dinary importance is treated in the next section 

Exercises 

In these exercises, let x denote a multiple observation x == {xi, • • 

Xn}, where, given the x^s are independent and identically distributed 
There will be no real advantage here m thinking of the paitition as 
twofold, or even finite, and for some of the exercises it will be imprac- 
tical to do so 

1 Let P(Xr 1 B^) = p^, if Xr = 1, 

= g^, if Xr = 0, 

= 0, otherwise, 

where Pt + = 1, and let ^'(x) = ^Xr 

T 

Show that’ 

(a) P{x 1 B,) = 

(b) y IS sufficient for x, using the factorability criterion, 

(c) P{y I Bi) = where, as always, = n^/y^{n - y ) ', 

2. For each positive integer z, let 

P{Xr 1 B^) = if Xr < 

= 0, otherwise, 

where the values of Xr are confined to the positive integers, and let 
y'{x) = max Xr Show that 

(a) Pix 1 B^) = liy <^, 

= 0, otherwise, 

(b) y is sufficient for x. 
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r 


3 In the two exercises above it has been possible to choose the fac- 
tor S identically equal to 1 To exhibit a more typical example, let tj 
Xr, and y be confined to the positive integers with y\x) = max Xr^ as 
in the preceding exeicise, and let 




2Xr 


Show that. 

= 0 , 


^(^ + 1 ) 

0 , 


if Xr ^ 

otherwise 

if 2/ < h 
otherwise 


(b) y IS sufficient for x 

4 Put no restriction on the conditional distributions P{xr [ ex- 
cept that Xr be confined with probability one to some fixed finite set 
Say, for the moment, that two values r and x' of x aie team mates ^ if 
one arises from the other by pei mutation oi the component obseiva- 
tions This divides the possible values of x into leam.s, and, academic 
though it may seem, the team to which x belongs can be taken as 7/(x) 
Show that the probability of x given y'(x) and is independent of ^ 
(if it IS defined at all), so that the statistic y'(x) is sufficient for x 

If the values of the Xr^s happen to be real numbers, then foi any x 
it IS possible to permute the component observations to obtain a non- 
dccioasing sequence ot n (not necessanly distinct) numbers, and only 
one such non-dccreasing sequence can be so obtained from each x 
The sequence thus attached thiough x to each s is called in statistical 
usage the sequence of order statistics coi responding to x Since team 
mates, and only team mates, have the same ordei statistics, the set of 
ordei statistics regarded as a single statistic is equivalent to the team 
statistic y'(x) defined more geneially in the paragraph alcove and is 
therefoie sufficient 

5 Let X, given Bi be sulijoct t.o the normal probability density with 
mean /xi, and variance that is, 

(13) <l>(xr I B,) = (27r)“'^ exp {-{Xr - y^)^/2o■^^] 


This situation, though elementary, does not fall within the technical 
scope of this book, because x^ is not confined to a finite set of values 
The reader familiar with probability densities will see, however, that 
the density of x is 


(14) (t>{x^, 


Xn I B^) = (27r) exp 






Ml I 
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which suggests that y, defined by 

(15) y\x) = 2a;,}, 

may fairly be called a sufficient statistic for x 

Show in the same heuristic way that, if (I^ is independent of then 
y^{x) = defines a sufficient statistic, and that, if is independent 
of i, then y\x) = nZxr^ — does so ^ 

6 If w and z are observations independent of each other given 
under what conditions can w be sufficient for {w, z} 

7 To break away from independent observations, suppose that, in 
the event n cards are dealt from a thoroughly shuffled deck of n + i 
cards each bearing a different serial number from 1 through n + i 
Let^w, be the number on the rth card dealt and w = {wi, • , "w^}. 
Show that max w, defines a sufficient statistic for w, and that the w,’s 

T 

are not independent 

8 If z extends w, and w is sufficient for y, then z is also sufficient for 

y 

9 If z IS sufficient for w, and y is independent of both z and w, then 
{z, y) IS sufficient for (w, y} 

10 Eveiy definitive statistic is sufficient. 

In virtually all statistics texts it would be said that the y defined by 
(15) constitutes not one statistic, but two, similarly, the set of order 
statistics would ordinarily be referred to as n statistics rather than as 
one. There aie contexts in which it is appropriate to try to count sta- 
tistics in that fashion, but, so far as the theory of sufficient statistics 
is concerned, it often seems fruitless, if not positively detrimental, to 
do so 

The concept of sufficient statistics has proved of great value in sta- 
tistical theoiy and practice The reason for this does not seem to me 
altogether easy to analyze, but, as the exercises above illustrate, the 
families of distributions most frequently studied m statistics are gen- 
erally rich in sufficient statistics It is hard to separate cause from 
effect here, for the distributions that are most studied tend to be those 
having the greatest mathematical simplicity, and the presence of strik- 
mg sufficient statistics, such as those exhibited by Exercises 1, 2, 3, 5, 
and 7, are among the sources of mathematical simplicity most often 
met in the study of particular famihes of distributions 

It must be emphasized that sufficient statistics often provide a signifi- 
cant saving in the mechanical labor of storing and presenting data 
Thus, in any experiment faithfully represented by Exercise 1, it is 
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sufficient, in both the technical and ordinary senses of the word, to 
record a single integer y in place of the list of XrS, which might well be 
very long Several of the other exercises would in pimciple also lead 
to great savings of this sort, but Exercise 5 is the only other that arises 
frequently in practice 

The concept of sufficient statistics was introduced, togethei with 
much of the theory associated with it, by R A Fisher (cf index, [F6]) 
fhe subject has been one of continuing inteiest and has been explored 
in seveial directions, key references are [B2], [El], [L6], [H3], [K15], 
and [M5] 


6 Likelihood ratios 

The random variable /3(x) has played so important a role in preced- 
ing sections that the leader will probably not be surprised to find^that 
/3(x) is a sufficient statistic for x, a conclusion that, in the light of the 
factorability criterion (4 10), can be seen thus. 


( 1 ) 


I ^0 


PiB, I a;) 

m 


P{x) 


K'l' I a:) 
K^) 


P(x) 


If a statistic is sufficient, it is sufficient irrespective of the value of /5, 
moreover, any multiple of it by a non-zero constant is also sufficient 
Therefoie, (1) implies that for any numbers such that o^(^) > 0, 
the multiple observation r{a) defined by 


(2) 


T,(x; a) 


P(x\B^) 

°'7:aij)P(x\B,) 


r(x; a) =Df {ri{x, a), r 2 {x, a)} 


IS a sufficient statistic for x. Since 


(3) YL oi(j)r,(x, a) = 1 

there is some redundancy in retaining both components, but this re- 
dundancy is more than compensated by the advantage of retaining 
symmetry, especially when n-fold partitions are contemplated 

Formally, the r(aj)^s are an infinite family of sufficient statistics, one 
for each but to all intents and purposes they repiesent but one suffi- 
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cient statistic, for any r(a) is equivalent to any other, say r(a'), as can 
be demonstrated thus. 

(4) r.(* .) - I £.)/Z»-WPfr I B,} 

'Za{3)rj{x, a') ^ 


Having such a multiplicity of forms for what is essentially one im- 
portant statistic IS rather embarrassing, so there is some mcentive to 
pick a standard form Setting each = 1 recommends itself as con- 
venient and leads to the particular statistic r = {ri, r 2 }, where 


(5)* 




PiA B,) 

j:pix\B,)' 




This form is indeed convenient for twofold and, more generally, for n- 
fold partitions, but, where infinite partitions are to be dealt with, its 
apparent naturalness is misleading, for the sum in the denominator of 
(5) IS then typically divergent In the case of twofold partitions, a 
convenient form for the statistic is that of a likelihood ratio, in the 
sense introduced in § 3 6, for it is easy to see that, infinite numbers 
being admitted, P{x | Bi)/P{x | £ 2 ) is equivalent to r Henceforth, any 
statistic equivalent to r will be called a likelihood ratio of x with re- 
spect to the partition — a definition that does not seriously conflict 
with ordinary statistical usage of the term. 

Figure 1 illustrates a geometric interpretation of likelihood ratios 
that is sometimes valuable The figure can best be described by telling 
how to draw it First draw a pair of cartesian coordmate axes for varia- 
bles and U 2 Next draw the two line segments represented by Ui -f 
U 2 ^ I and {ui/a{\)) + {u 2 /a{ 2 )) = 1 with the i^^’s non-negative The 
left ends of these segments are indicated in Figure 1 by a and &, re- 
spectively, the paiticular value a = {1/3, 2/3} bemg used for illustra- 
tion Now plot the pomt {P(a; | B-^, P(x | 52 )} If ^ has positive 
probability (for any, and therefore for all, P ) , this point will be different 
from the origin 0, so it will be possible to draw the (dashed) line con- 
necting the origin with the point {P{x | 5i), P{x | JB 2 )} This line (or 
ray through the origin, as it is often called) must necessarily pierce 
the line segments a and b The important geometrical fact, which the 
reader will have no difiiculty in verifying, is that these mtersections 
occur at the points { 7 * 1 ( 0 :), r 2 (a:)} and {ri(a;, a), r 2 ix, a)}, respectively 
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It i« al &50 obvioius that the ratio | Bi)/P{i | is the rceiprocal ot 
the slope of the ray 

Since, to each x that occurs with positive piohabihty, iiieic corre- 
sponds a ray thiough the oiigm, the lay can be taken as a statistic, 
according to the geometrical constiuction of the pieceding paiagraph, 
this statistic IS equivalent to r and is thcicfoie a likelihood ratio of x 
with 1 aspect to the paitition B^ 

The ray connecting the oiigin Avith a point { wi, conveniently 

be icpiesented by the suggestive notation ?/i .U2j though, ot course, dif- 
feient pairs of numbeis can icpresent the same ray More explicitly, 
if X IS any number different from 0 , \ui:\u2 repiescnts the same lay 
as ui:u2 In analytical projective geometry any pair ot numbeis rep- 
lesenting a ray in this fashion is called a set of homogeneous coordinates 
of the ray The redundancy of the notation ui :u2 may be lemoved by, 
for example, characterizing the ray by the reciprocal of its slope Ui/u2 
Such non-homogeneous coordmatization entails a sacrifice m symmetry 
and the necessity of admitting infinity as a meaningful value of the 
quotient, both losses are quite troublesome m extension of these geo- 
metric concepts to cartesian space of n dimensions, which is necessary 
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in connection with n-fold partitions In homogeneous coordinates the 
likelihood ratio can conveniently be represented by any of the equally 
good sets of homogeneous coordinates, P{x | Bi):P(x [ B 2 ), ri(x):r 2 (x), 
and 7i(x, a),r 2 (v, a) Finally, it may be remarked that P{x\Bi)/ 
P{x I ^ 2 ) IS a non-homogeneous coordinate. Thus the many equivalent 
forms in which the likelihood latio statistics can be naturally expressed 
corresponds to the many different notations by which a ray through the 
oiigm can be naturally designated 
The most remarkable fact about the likelihood ratio considered as a 
statistic is that it is necessary^ so to speak, as well as sufficient By that 
I mean that to have the advantages of knowing x it is necessary as 
well as sufficient to know the likelihood ratio The point can be put 
formally thus ^ 

Theorem 1 If y is sufficient for x, then y is an extension of r 


Proof The theorem is virtually obvious in terms of the factora- 
bility criterion for sufficient statistics, foi in the notation of (4 10) 


(0) 


3) 


with probability one, exhibiting as a function of y, ♦ 


Corollary 1 If z is sufficient for x, and if every y sufficient for x 
IS an extension of z, then z is equivalent to r 

By ordinal y analytic standards, the likelihood ratio seems to be a 
rather complicated statistic, at least m the case of n-fold partitions, 
where n is at all large, for, to one who takes seriously the idea that a 
multiple statistic should not also be regarded as a single statistic, the 
likelihood latio seems at first sight to be n, or perhaps (n ~ 1), statis- 
tics Yet Theorem 1 and its corollary show that the likelihood ratio is, 
in a fundamental sense, the most compact sufficient statistic that a 
paitition problem admits 

As an explicit example of a likelihood ratio, consider the twofold par- 
tition pioblem arising from Exercise 4 1 on confimng attention to two 
different values of p, say pi and p 2 The likelihood ratio r is easily 
computed thus* 


(7) 

P{x 1 S,) = 




(PiV 




so 

\1 - pj 

\qz/ 

(8) 
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Theorem 1 is thereby verified in the present instance, for (8) exhibits 
r explicitly as a contraction of y, and y is easily exhibited as a contrac- 
tion of r thus 

log 

(9) y{x) = — 

^ log 

P2qi 

In this example, y is, m view of (8) and (9), equivalent to the likelihood 
ratio 


iro(r) 


© 


Exercises 

1 '^Express k(l3(x)) and y(F(x)) in teims of the likelihood ratio thus* 
(10) /3(^, r) = Df n/3(i)/E hKj), 


( 11 ) 

( 12 ) 






M) = E mr)) 

T 




2 This extended exeicise develops the peisonalistic and bchavioral- 
istic theory of what, following the objectivistic and verbahstic^ tradi- 
tions of statistics, IS called the testing of a simple dichotomy, a type of 
decision prolilem that, though seldom veiy lealistic, is a popular and 
instructive example with important implications for moie realistic piob- 
lems Verbahstically such a problem is described as that of making the 
best guess on the basis of an observation as to whether it is or 
that obtains Behaviorahstically, this is geneially mterpieted as the 
problem of deciding, on the basis of observation, between two piimary 
acts one of which is preferable to the other if Bi obtains and vice versa 
if B 2 does Here is one topic in which the assumption that % is confined 
to two values is rather more than simply a pedagogical simplification, 
a reader interested m relaxing the assumption will find pages 127-130 
of [W3] stimulating 

Suppose that F contains only two acts fi and and is dominated by 
neither Let (j^ij 

(a) There is no loss of generality in supposing 


(13) 


^1 =Df 


022 ~ 012 


> 0, ^2 =Df 


011 “ 021 


> 0 , 


which will henceforth be done That is, it will be supposed that fi is 
appropriate only to J^ijand vice versa 
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(b) Show that 

(14) k(p) = for /3(1) > 5i/(5i + § 2 ) = /3o(l) 

3 

= Z) for /3(2) > 82/(31 + 52 ) = /8o(2) 

3 

= ?(4-n + 4>2 i)^( 1) + i(</>i2 + 4>22)/3(2) + | 5i;8(2) - 82/3(1) 

= E «7/3(:;) + I Si^(2) - 82/3(1) |, 

0 


where /5o and the are defined by the context 

(c) I = /c(/5), if and only if /3(^) > This condition ob- 

tains for both ^’s simultaneously, if and only if /3 = /Jq. 

(4) Show that 


(15) fc(/3(r)) = Z) + I 5ir2/3(2) - 820 , 3 ( 1 ) 

^ 3 


where 

( 16 ) 

and that 


= Z <t>i 30 ij , r) for o > 0*(/3, 3o), 

3 




/3o(^)//3(^) _ 

ZMmj)’ 


/Z 


(17) i;(F(x) I /3) = Z + Z I 1 S 2 )/ 3 ( 2 ) - 82 P(r | Bi)^(l) 1 

3 r 

= {«! + 32(1 - 2P(ri < /3o) | Pi) 

- P(r = 0o) I Pi)]}i3(l) 

+ !«2 + 3i[l — 2P(r2 < r2*(^, ^ 0 ) | P 2 ) 

- PO = r*(/3, Po) I B^mm. 

(e) Any derived act f(x) determines a function i assigmng an t to 

each X, i being implicitly defined thus f(a:) = U(^x) Conversely any i 
determines a derived act Show that E{f{x) | /5) = v(¥(x) \ 13), if and 
only if /5o) for every x Such a function z{x) is called 

a likelihood-ratio test associated with r* Show that at least one likeli- 
hood-ratio test is associated with every value of r* and that if P(r = r*) 
= 0 (which IS typically the case) there is only one 

(f) If f(x) IS determined by a function of i, the probability of deciding 
on the inappropriate value of i in case Bj obtains is generally called 



PARTITION PROBLEMS 


140 


[7 6 


the piobability of an error of the j-th kind. Analytically the probabili- 
ties of error of the first and second kind aie, respectively, 

(18) ei = Df P(t('c) = 2 I Bi), 62 = Df P{^{x) = 1 I ^ 2 ) 

If i* IS a likehhood-ratio test associated with r*, show that its errois 
of the first and second kind are subject to the bounds 

(19) P(ri < n* I Si) < 61 * < Pin < ri* I Si) 

(20) Pin > n* 1 S2) < 62* < Pin > n* \ S2). 


What about the typical case that P(r = r*) = 0*^ 

(g) Show that, if i is at least as good as i* in the sense that e^ < 
for both z’s, then i is a likelihood-ratio test and i is viitually i* in that 
for both t^s Hint Consider an F and a /3 for which r*(/5, ^o) 
== showing that these exist, and note that, foi this decision piobtem. 


/3) = {61 - do(l ~ 2ei*)}^(l) + {62 - 5i(l - 2 c2*)}/S(2) 


( 21 ) 


= 2;(F(x) I /S) 

S(fi \l3) = ln- 32(1 - 26i)};8(1) +{ 62 - 5i(l - 262)1/3(2) 

> viFix) I 


with equality if and only if i is a likelihood-uitio test 
This important conclusion about likelihood-ratjo tests has been much 
emphasized, especially by the Ncyman-Pcarson school 


The concept of likelihood ratio, sometimes simply called likelihood, 
IS now one of the most pervasive concepts of statistical theoiy It 
seems to have been introduced m 1922 by R A Fishei (cf index of 
[F3]), who emphasized it in connection with the important method of 
estimation named by him 'The method of maximum likelihood Its 
use in testing hypotheses was apparently first emphasized by .) Ney- 
man and E. S Pearson (see Vol II, p 303 of [K2]) In connection with 
likelihood ratios as necessary and sufficient statistics, mathematically 
advanced readeis will be interested in Section 6 of [L6], [B2], and 
[M5]. One of the earliest contiibutions in this diiection was made by 
C. A B Smith [S14]. 


6 Repeated observations 

If x(n) = {xi, •, x^}, where, given the x^'s are independent 
identically distributed random vaiiables, then v(F(x(7i))) is a non-de- 
creasing function of n, for the (n + l)-tuple is an extension of the n- 
tuple If k{p) IS stiictly convex — a condition that you now recognize 
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as interesiiing — y(F(x(n))) is easily seen to be strictly increasing in n, 
unless the individual x^’s are either utterly irrelevant or definitive. 

It IS to be expected, especially in the light of the approach to certainty 
discussed in § 3 6, that, as n becomes very large, x(n) will become prac- 
tically definitive. Indeed, § 3 6 makes it possible to state and prove a 
formal theorem to that effect. 

Theorem 1 

Hyp 1. x(n) = {xi, •• , x^}, where, given the x/s are inde- 
pendent and identically distributed random variables 

2 The XrS are not utterly irrelevant to 

3 viF 1 /3) = m, 

CoNCL. lim »(F(x(n)) | /3) = 1(J3) = d £ ^(1)/c(1, 0) + 1) 

• n — > 00 

uniformly in 

Proof Writing x as short for x(n)j 

(1) v(F(x) I fi) = Emm 

For an arbitrary e > 0, let the closed interval I on which k is defined 
be partitioned into two subsets J and if, where J is the set of those 
fi’s such that 

( 2 ) m > m ^ 6 , 

and K is the complement of J relative to /. 

It follows from the continuity of the functions on each side of (2) 
that jS e J, if either component of is sufficiently large. 

The computation initiated in (1) can now be carried forward thus 

(3) ^/[^(^(X))] = £[fc(/3(x)) I Kx(s)) mPims)) £ J) 

+ E[mx)) 1 Km e KlPmis)) e K) 

> Emx)) I Km mp(Km m 

+ mm k(K)-P(^ixis)) sK) - e 

/S' 

= Eimm - {E[i(m l Km 

— mm k(0)}P{p{x(s)) eif) — e 

/S' 

> l(fi) — max j kip') |•P03(a;(s)) eK) — e, 

/S' 

Now, in view of the paragraph in which (3.6 15) occurs and the fact 
that, if either component of /3 is close to 1, e J; P{l3{x(s)) e K) becomes 
arbitrarily small for sufficiently large n. 
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7 Sequential probability ratio procedures 

The present section digresses to discuss an interesting application of 
the ideas presented in this chapter to what is called sequential analysis 
Sequential analysis refeis in principle to the theoiy of observational pro- 
grams m which the selection of what observations to make in later 
phases of the progiam depends on what has been observed in earlier 
phases Such behavior is commonplace in everyday life, for example, 
you look for something until you find it, but not longer Statistics it- 
self has always used sequential procedures For example, it is not rare 
to conduct a preliminary experiment to determine how a main experi- 
ment should be earned out Thus, if one were required to estimate 
with a roughly preassigned precision the mean of a normal distribution 
of unknown mean and unknown variance, one might reasonably bj^gin 
by taking ten or twenty obseivations, which would give some idea of 
the variance and would theiefore deteimine about how many obseiva- 
tions aie necessary foi achieving the leqmsite piecision 

Commonplace though problems with sequential features arc, A Wald 
was the first to develop (1943) a systematic theory of a considerable 
body of problems of this sort Foi early history sec the Introduction 
of [W2] and the Forewoid of Section I of [S17] 

Some later ideas on sequential analysis, due mainly to Wald and 
Wolfowitz, aie the subject of this section It will not be piactical to 
proceed with full rigor, primaiily because random vanablcs capable of 
assuming an infinite number of values are necessanly involved. Full 
details are given m [W3] and more compactly m [A7], but not in Wald’s 
book on sequential analysis [W2]. 

Let X = (x(l), • • , x{v)j }, where the x(^;)’s are conditionally an 
infinite sequence of independent, lelevant, identically distributed ran- 
dom variables Rather informally, a sequential observational program 
with lespect to x is a lule telling whether to observe x(l) or whether to 
make no observation at all, if the particular value n(l) is observed, 
whether to observe x(2) or to discontinue observation, if the values 
a;(l) and x(2) are observed whethei to observe x(3) or to discontinue 
observation, etc 

More formally, let N be a function of the infinite sequence of values 
X = {:c(l), •, x{v), } such that, if the sequence x' agrees with x in 

every component from the first through the iV(x)th, then A* (x') == N(x) 
Such a function N determines a sequential observational program, 
which IS a contraction ot x, call it y(x, N), defined thus 

2/(x,N) =Df {X(l), ■,x{Nix))} 


( 1 ) 
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It IS to be understood that, if N(x) is zero for some x, it is identically 
zero, and that 2/ (x, 0) is a null observation 

It will be assumed that the random cost associated with a sequential 
observational program is propoitional to the number of random varia- 
bles obseived, that is, c = N(x)yj 7 > 0. No categorical defense of 
this assumption is suggested, but clearly there are mteresting problems 
in which it is met at least approximately The domain of applicability 
of the theory can actually be considerably extended by modifymg the 
assumption to include a fixed overhead cost that applies except in case 
N IS identically zero, this does not greatly complicate the analysis, as 
the interested reader will be able to see for himself The theory would 
even remain virtually unchanged, if c were only assumed to be of the 
form 

N(x) 

(2)* c = h+2:cW, ifiV>0, 

V=1 

= 0, if iV = 0, 

where h, c(l), c(2), are independent with finite expected values 
E(h) > 0, E(c{r)) > 0, and the c{vys are identically distributed 
For any F there are some values of jS for which it would be unwise to 
adopt any sequential observational program other than the null obser- 
vation Suppose, for example, that /3 is so close to an extreme value 
that Z(/3) ~ fc(/3) < 7j under this circumstance the most that could be 
gained by obseivmg even x itself would be less than 7, but the cost of 
making so much as one observation is at least 7. Let the set of values 
of for which it is not justified to make any but the null observation be 
denoted for a while by J(F, 7), or simply J, for short 

Now, if 13 e J, the pei son’s utility can, by the definition of J, be maxi- 
mized by refraining from any observation but the null observation and 
accepting the utility /c(/5), otherwise there will be some advantage to 
him m observing x(l) If the person does observe the particular value 
a?(l) of x(l), he finds himself with a posteriori probabilities P(x{l)) in 
place of the a prion jS, he has paid (or at any rate entailed) a cost 7, 
and he must now decide whether to make any further observations 
His new problem is simply the problem he would have faced at the out- 
set had his a prion probabilities been mstead of /3, except that 

all utilities are now 1 educed by 7 He justifiably accepts the utility 
— 7, if P{x{l)) sJ, otherwise he will observe x(2) Continu- 
ing this line of argument step after step, it follows that optimal action 
consists in observing successive x(y)’s until an a posteriori probability 
in J occurs, and then adopting a basic act consistent with the a posteriori 
probability. 
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In actual practice, it is far from easy to determine whether a particu- 
lar value of 13 belongs to J(F, 7), because in principle the whole enormous 
variety of sequential observational progiams has to be explored to de- 
termine whether any one of them has a derived value greater than fc(/3) 
The practical advantage achieved in the pieceding paragraph is that 
of greatly restricting the class of progiams that meiit consideiation 
Thus the problem of determining whether p eJ(F, 7) does not require 
a® survey of all observational progiams, but only of those defined m 
terms of some set J' according to the rule that N (x) is the first integer 
for which /5(a:(l), • , x{n)) e/' 

If programs corresponding to all sets had to be examined, the 
process would still be mathematically impractical, indeed, in all but 
special cases, practical solutions have yet to be found But, if any 
special conditions that J must necessarily satisfy aie discovered, ;^nly 
sets J' satisfying those conditions need be examined Some veiy gen- 
eral conditions aie these J contains the extreme points oi I] J is topo- 
logically closed, that is, if a value ^0 is not in J, then the near neighbors 
of I3q aie also not in J The first of these conditions icquiies no com- 
ment, and the second follows easily from the continuity as a function of 
/3 of 

(3) N))) - tN I ^] - /c(/3) 

These conditions alone do not go far toward nai rowing to piactical 
limits the variety of sets to be explored Thus far m the development 
of the subject, really powerful conditions have been obtained only at 
the expense of considerable restrictions on the structure of F or, equiv- 
alently, of k 

Suppose, then, that F is dominated by a finite number of acts or, 
what amounts to a little less, that the giaph of k is polygonal, as it is 
for the k graphed in Figuie 2 1 Technically, this lestiiction on k may 
be expiessed by saying that the interval I is the union of a finite num- 
ber of intervals of linearity of k Under the resi/nction, relatively much 
can be concluded about the structure of J(F, 7), for it is true m general, 
as will be shown in the next paragraph, that the intersection of J with 
any interval of linearity of k is a closed interval 

Suppose, indeed, that fix and /?2 belong to J and to a common interval 
of linearity of k, but that /3o on the interval between /3i and P 2 does not 
belong to J A contradiction follows according to the following com- 
putation, in which h is any act derived fiom a sequential observational 
program, cost included, that is advantageous at /3o 

(4) Z m I B,)/3oO) > KM, 

J 
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for h IS supposed to be advantageous at /So; and 

(5) 23 EQo. I B^)Pn{i) < h{Prn), m = 1, 2, 

% 

for no deiived act is supposed to be advantageous at since 
Since iSo is a weighted average, say 'EymPm, of the pjSj and since fc(/3) is 
lineal in the mteival between /3i and 132, it follows from (4) and (5) that 

(6) E m 1 BM) < mo), 

I 

contradicting (4) The supposition that e has thus been re- 
duced to absurdity 

The demonstration just given extends directly to n-fold problems 
The general conclusion is that the intersection of J with any doiflain 
of linearity of k is convex, so that, if k is polyhedral, J is the union of a 
finite number of closed convex sets, each lying wholly in a domain of 
linearity of k The practical implications of the conclusion are enor- 
mously greater for twofold than for highei-fold problems, because 
twofold pioblems lead to one-dimensional bounded, closed, convex 
sets, which present no great variety, all of them being closed bounded 
intervals But thieefold problems, for example, lead to closed bounded 
two-dimensional convex sets, a restriction that leaves great room for 
variety 

If k is polygonal, the variety of sets J' to be suiveyed is enoimously 
1 educed, for J' must be the union of a knovm number of intervals, each 
of which IS confined to a known interval Suppose that this number is 
m, the class of sequential observational programs to be surveyed can 
be characterized by the two end points of each of the m intervals, ex- 
cept that the possibility that some of the intervals are vacuous must be 
borne m mind Since the extremes of I are necessarily^ in J, and there- 
fore necessarily appear as end points of intervals in J, the exploration 
has been i educed to a 2 (m — 1) parameter family of possibilities 

The possibility that m = 1, which almost means that F is dominated 
by a single element of itself, is tiivial, for then all /5’s aie in J, and ob- 
servation IS nevei called for This can be seen in many ways. In par- 
ticular, it follows as an illustration of the machineiy that has just been 
developed, thus The end points, or extremes, of 7 aie both in/, as al- 
ways, and, since m = 1, they are both in the same interval of linearity 
of /, therefore the mteival between them, namely every value of 0, 
lies in J 

The possibility that m = 2 — in ordinary statistical usage, the se- 
quential testing of a simple dichotomy — is of particular importance 
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It occurs typically when F is dominated by two acts, neither of which 
dominates the other, as in Exercise 5 2, One of the two acts is approp- 
riate to one '^hypothesis^^ Bi, and the other is appropriate to B 2 In 
case m = 2, it is easily seen, by methods that have now been indicated 
moie than once, that each of the two closed intervals that constitute J 
has as one end point one of the cxti ernes of I Neither of the two inter- 
vals can be vacuous, nor can eithei consist only of a single point It is 
felatively easy to find, at least approximately, the two values of /3 that 
determine J(F, 7), and the theory of this situation has coirespondingly 
been brought to a relatively high degree of perfection, for details, see 
[S17], [W2], [W3], and [A7] 

Following (or at least paraphrasing) Wald [W2], a sequential obser- 
vational program characterized by making successive observations un- 
til the a posteriori probabilities fall into some set J, followed by a;^opt- 
ing a basic act appropriate to the a posteriori probability, is called a 
sequential probability ratio procedure. The reason for this nomencla- 
ture IS that to observe until the a posteriori probabilities fall into J is 
to observe until the numbers 


(7) 


I a(l), 




he in a certain set, or, what amounts to the same thing, satisfy certain 
conditions But, the paiiicular value of 13 having been assigned, this 
IS tantamount to requiimg the ratios of piobabilities 

P(x(l), ■ ,n(iV)|ijO 
^ P(.t(l), , xiN) I B,) 

to satisfy certain conditions 

Since (7) and (8) arc ways of expiessmg the likelihood ratio, the ob- 
servational piogiam together with the act deiived fiom it might also 
be referred to as a sequential likelihood-ratio piocediuo Indeed, but 
for the precedent established by Wald, that would seem the better 
name. 

As an actual example of a sequential probability ratio procedure, 
suppose that the distribution of x(v) given B^ attaches the probabilities 
and Qi = I Pi io the values 1 and 0, respectively The expression 
(8) can in any case be written in the factored form 


(9) 


hi IpCaWlP,)!’ 
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( 10 ) 

where 

( 11 ) 


\P2/ \q2/ Vg2/ Vp2Sl/ 


JV 


y(N) = E <v) 


V=1 




It IS noteworthy, in connection with suflScient statistics, that the con- 
dition that the a posteriori probability be in J is in this case expressible, 
according to (10), as a condition on y(N) and N Specializing the ex- 
ample further, suppose that J is of the sort appropriate to testing a 
simple dichotomy. The condition that the a posteriori probability be 
in IS then expressed by each of the following equivalent pairs of 
inequalities, where and a 2 are positive numbers such that ai + a 2 


< 1 
( 12 ) 


^(1 1 x{l), ■ , xm < 1 - a(l), 

/3(2l2:(l), . •,a:(Ar))<l -a(2) 


(13) 


/3(1)Q 

iS(i)Q + m 

m 

^(i)Q + m 


< 1 - «( 1 ), 


< 1 - «( 2 ), 


where Q for the moment denotes the likelihood ratio (10) 

/3(2)(1 - a(l)) 


(14) 


Q < 

Q> 


/3(l)a(l) 

/3(2)a(2) 
/3(1)(1 - a(2)) 


= Q*, 


where Q*, Q* are defined by the context Since, according to (13), the 
structure of is superficially determined by three parameters, say 
by Pi, Oil, and a 2 , it is worthy of some note that the corresponding con- 
dition IS ultimately expressed in terms of only two special parameters, 
Q* and Q* , this is only natural, considermg that is an open interval 
determined by its two end points The act that would be appropriate 
to Bi IS called for by values of Q > Q*, and the one appropriate to B 2 
IS called for by values of Q < 
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Thus far, the particular form (10) of the likelihood ratio has not 
really been exploited in the calculation, so (14) applies to the testing of 
simple dichotomies generally Taking account of (10), (14) can by ele- 
mentary manipulation be put m the following form 

y(N) < [log Q* + N log (^ 2 /^ 1 )} /log (pig 2 /P 2 ( 7 i), 

(15) 

y(N) > {log Q* + iV log (g 2 /gi)}/log (Viq 2 /V 2 qi), 

where, for definiteness, it is supposed that pi > P 2 Thus, the legion 
in the (iV, y) plane determined by ~J, the region in which further ob- 
servations are called for, is a band bounded by two parallel lines of 
positive slope 

8 Standard form, and absolute comparison between observations 

If X and y are such that, for every F and /3, ?;(F(x) | p) > v{¥(y) | /3), 
then X imitates, so to speak, an extension of y, and it may appropriately 
be said that x is a victual extension of y Coirespondingly, if x is a vir- 
tual extension of y, and y is a vntual extension of x, it may be said that 
X and y ai e virtnally equivalent 

No matter what a prion probabilities a peison may have, 01 what 
basic acts aie available to him, he will have no preference between a 
pail of virtually equivalent observations, so virtually equivalent obser- 
vations are indeed equivalent for many piactical pui poses Where com- 
binations ol obseivations aie undei consideiation, however, the rela- 
tion of virtual equivalence does not resemble true equivalence. For 
example, if x and y are equivalent, then each is equivalent to the mul- 
tiple obseivation {x, y}, but if x and y are only viitually equivalent, 
they may well be independent, in which case neither will typically be 
equivalent to {x, y} 

This section exploies the notions of viitual extension and vntual 
equivalence In paiticular, an mtciesting standard lepiesentative of 
the class of observations virtually equivalent to a given observation x 
IS defined and discussed This material is scaicely referred to later in 
the book, and it may without much loss be skipped or glossed over. It 
will be couched frankly in the language of n-fold as opposed to twofold 
partitions, but readers with the rest of the chapter behind them will 
easily be able to concentrate on the twofold situation, if they find it 
more understandable 

Most of the ideas to be presented m this section were originated by 
H F Bohnenblust, L S Shapley, and S Sherman in a private memo- 
randum dated August 1949, which I was privileged to sec at that time. 
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This work was extended and brought to the attention of the public by- 
David Blackwell in [B16] 

It is obvious that, if y is a sufficient statistic for x, then z and y are 
virtually equivalent. In particular the likelihood ratio r derived from 
X IS virtually equivalent to x Moreover, the reader may anticipate, and 
it will be formally shown in the course of this section, that if and only 
if observations are virtually equivalent do their likelihood ratios have 
the same distribution for every value of 0, or, what comes to the sanie 
thing, given each ^ = 1, , n. Thus the n conditional distribu- 

tions of the likelihood ratio given each could be taken to characterize 
the observations virtually equivalent to a given one, say x Actually, 
as will be shown, the class of observations virtually equivalent to x can 
be represented by the distribution of the likelihood ratio for any single 
nouroxtreme value of (3 For definiteness, the particular value jS* = 
{1/n, , 1/n} will be used, but the mterested reader will find it a 

simple exeicise to extend all the considerations based on to any 
other non-extreme /3, as would be necessary in any extension of the theory 
to infinite partitions 

Let m{r) be the probability that the likehhood latio in the standard 
form (5 5) attains the particular value r when /5 = /3* With self-evi- 
dent abbreviations, 

(1) m{7') = P(7 I p*) 

J 

= lz E 

j rCx)=r 

The second line of (1) exhibits m{r) expressed in terms of the n distri- 
butions P(r I B^) It is rather more interesting to see that those n dis- 
tributions can themselves all be expressed in terms of the single dis- 
tribution m, as follows from the definition (5 5) of r and the third line 
of (1) thus 

(2) P(r I 5.) = Z P(x I P.) 

r{x) =r 

= E E I Bj) 

r(x)=r 3 

= nr^m{r) 


Similarly, 
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Regarded as a probability measure on the set of all n-tuples of num- 
bers r, m has the following three impoitant properties 

F(n > 0 1 m) = 1, 

(4) = 1 1 = 1. 

r E(Tt I m) = n~^ 

Of these, the first two aie obvious fiom the definition of r, and the third 
follows by calculation from (2) thus' 

(5) 1 = 2 Pi’’’ I -Bj) = w 2 hmir) 

r r 

= nE(t^ I m) 

Conversely, suppose that m is any mathematical probability defined 
on the set of n-tuples r of numbers, subject to the conditions (4), then, 
as can easily be verified, n mathematical piobabilities are formally 
defined by the equation P{t | B^) — nr^m{r) Mathematically, r dis- 
tiibuted thus can be regarded as an observation The following calcu- 
lation demonstrates the expected conclusion that the likelihood ratio 
of this observation is the observation itself and that its distribution 
given jS* is w 

P(r I B,) nr^m{T) 

2 I « 2 r^mir) 

(6) ' " 

P{r I /3*) = ]2 nrjm{r){l/n) = m(r). 

3 

It IS interesting and fruitful to compute a(F(x) | /3) in terms of m. 

(7) t;(F(x) I 0) = EQom)) | P) 

= P[K{r,/3(*)/2r,/J(i)})|^] 

= nE{k{{x,m/'Lhm\)T.rm I ml. 

3 3 

Temporarily adopt the convention that, if a is any n-tuple of positive 
numbers and h any function of r (not necessarily convex), T{a)h, is a 
function of r defined thus* 

(8) T{oi)h{r) =mK[riOi{i)/Y,r,a{j)}) 1 ,rja{ 3 ). 

3 

Then (7) takes the abbreviated form 

(9) BQmx)) 1 P) = nEinmP) I m). 
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To see the implications of (9), it is necessary to know something about 
what the operation T(/3) does to the function k, in particular to know 
that r(/5)k is convex in r. The derivation of these necessary facts is 
straightforward and is left to the reader as a sequence of exercises 


Exercises 

la T(a)mh - , a(n)Kn)})h = T(^)T(c.)li. 

lb h = T({a(l)-S a(n)-^})T(a)h 

2 = 

n 

3. If A(r) > g{r) for r between r' and then T(a)h{r) > T{a)g{r) 
for r between 'f'/cnij) and 

3 3 

4^ If h IS linear, then so is T(a)h ^ 

5 If h IS convex (strictly convex), then so is T(a)h 
Exercise 5 is obvious in the light of Exercises 3 and 4, but some may 
prefer the demonstration suggested by the following calculation, where 
X + /x = l,X, and obvious abbreviations are used 


(10) T{a)h(Xr + fxr') 

\a r 


= h 


a + 


fxa r 


a ‘ (Xr + fir') a r a (Xr + fxr') 

^ Xh( o: ) a: • r + /ihl a ] a r' 

\a T / \a r' / 


' r' \ 

a 1 a 

fir') a r' J 


(Xr + fir') 


= XT{a)h(r) + fiT(a)h{r') 


It IS amusing to establish once more that observation generally pays, 
this time by means of (10), (4), and Exercises 5 and 2 

(11) nE{T{p)k(i) 1 m) > ?iT(/3)fc(B(r | m)) 

= nT(p)ki^*) 

= m. 

If X and x' are observations and m and m' are the correspondmg dis- 
tributions, it IS now easy to say in terms of m and m' when x is utterly 
irrelevant, when it is definitive, and when x is virtually an extension of x'. 


More exercises 

6 The observation x is utterly irrelevant if and only if P(r = /3* | m) 
= 1 

7. The observation x is definitive; if and only if P(n = 1 | m) == l/Uj 
or, equivalently, if and only if P(r» = 0 | m) = (n — l)/n. 
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8a. The observation x is a virtual extension of x', if and only if, for 
every convex function h defined for r, 

(12) E(h(r) 1 m) > E(h{x) 1 7n') 

8b The two obseivations are virtually equivalent, if and only if, foi 
every convex function h, 

(}3) E(h{i) I m) = E{h(T) | m') 

The conclusion reached in Exercise 8b can be much improved In- 
deed, it will be shown that the two observations are virtually equiva- 
lent, if and only if m and m' are the same probability measures This 
will be achieved if, for example, it is shown that m and m' have the 
same moments, for it is well known that two different countably addi- 
tive probability measures confined to a bounded set of n-tuples of num- 
bers cannot have the same moments f The moments m question are 
expected values of monomials of the form 

(14) g(r) = 

where the e/s aie non-negative integers In general, g will not be 
convex, so it cannot be concluded immediately that g has the same 
expected value with respect to m and m'. If, however, a highly convex 
function is added to g, then the sum will be convex and its expected 
value will be the same with respect to ?a and 7ti' Since, by hypothesis, 
this IS also true of the convex teim of the sum, it must also bo tiue ot 
the not necessaiily convex term Specifically, let 

(15) h(r) = gir) + X ^ r/, 

J 

where X is a positive number to be determined later To test h for con- 
vexity, let s be for the moment an aibitrary n-tuple of numbers and cr 
a real variable, and compute the second derivate of h{r + (x.s) with re- 
spect to 0 - at (7 = 0 

+ X 

Considering that each is between 0 and 1, the absolute values of the 
derivatives of g that appear in (16) have a common upper bound, say 

t See, for example, Coiollary 1 1, p 11, of [S13] 

Under our usual simplifying assumption that x is confined to a finite number of 
values, m is certainly countably additive Actually, the whole tlieory can be de- 
veloped mutatis mutandis assuming only that the distribution of x is countably 
additive on some suitable Borel field 
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iL4; SO, if X > h is convex m the region where each 7\ lies between 0 
and 1 and is a fortiori convex in the intersection of that region “with 
the hyperplane = 1. 

Now that it has been established that m and m' represent virtually 
equivalent observations, if and only if m and m' are identical, it is ap- 
parent that m — or, more exactly, the set of conditional distributions 
P(r I B^) = nrim(r) — is a unique standard fonn for all observations 
virtually equivalent to x 

If X virtually extends y, it is to be expected that, no matter what rea- 
sonable definition of “informative” may be suggested, x will be at least 
as informative as y In particular, it is to be expected that the infor- 
mation of with respect to Bj (as defined in § 3.6) will be at least as 
large for x as for y, 'which the following calculation verifies, supposing 
for simplicity that, for both observations, infinite information is im- 
possible The point in question depends on the convexity of the func- 
tion h defined by 

(17) h{r) = n(log n - log Tj), 

because 

(IS) = -^Xlog n - log Vj 1 B^) 

= nE[r^(log — log r^) ] m] 

The requiied convexity can be demonstrated much as it was m (15) 
for a different function also momentaiily called h 

(19) — h{r -f- (Ts) 

d(x 


= — j - nsj)^ > 0 



It would be interesting to know whether every virtual extension is 
realized by an actual extension, that is, whether whenever x is a vir- 
tual extension of y there exist random variables x' and y' such that x 
and x' are virtually equivalent, y and y' are virtually equivalent, and 
x' extends y' To the best of my knowledge that conclusion has thus 
far been established only in the case of twofold problems, the demon- 
stration for that case being given by Blackwell m [B16] 



CHAPTER 8 


Statistics Proper 

1 Introduction 

I think any professional statistician, whethei oi not he found himself 
in sympathy with the preceding chapters, would feel that, even allow- 
ing for the abstractness expected m a book on foundations, those chap- 
ters do not ically discuss his piofession He would not, I hope, find the 
same shoitcoming in this and the succeeding chapters, ior they aie con- 
cerned with what seems to me to be st,atisi<ics jirojiei The puipose of 
t.he present shoit (‘haptei is to explain this transition and to serve as a 
general mtioduction to its successois 

2 What IS statistics proper? 

So far as I can see, the feature peculiar ix) modem stai-isi.uail activity 
is its effort to combat two madequaiaes of the iheoiy of decision, as I 
have thus far discussed it In the first place, there aie the vagueness 
difficulties associated with what m §t 2 wcic called ^fiinsuie piobabili- 
ties ” Second, thcic aie the special pioblems that arise fiom more than 
one person participating in a decision 

From the pcrsonahstic point oi view, statistics proper can iierhaps be 
defined as the ait of dealing with vagueness and with intcipersonal 
difference in decision situations Whethei this very tentative defini- 
tion is justified, latei sections and chaptcis will permit the statistical 
reader to judge At any late, vagueness and intcipeisonal difference 
are the concepts that, diicctly oi mdiicctly, dominai.c the lest of this 
book 

I will not tiy to discuss vagueness in this chapter, but something 
may profitably be said heic about interpeisonal differences. 

3 Multipersonal problems 

As I have already frequently said, it seems to me that multipersonal 
considerations constitute much of the essence of what is ordinarily 
called statistics, and that it is laigely through such considciations that 
the achievements of the British-American School can be interpreted in 
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terms of personal probability This is a view that can best be defended 
by illustration, and the requisite illustrations wuU be scattered through- 
out later chapters, but some support is lent to it by those critics of 
personal probability who say that personal probability is madequate 
because it applies only to individual people, whereas the methods of 
science are, more or less by definition, those methods that are accepta- 
ble to all rational people 

The sort of multipersonal problems I mean to call attention to are 
those arising out of differences of taste and judgment, as opposed to 
those, so familiar m economics, arising out of conflictmg interests As a 
matter of fact, the latter type of multipersonal situation can, if one 
chooses, be regarded as among the former; it may, for example, be 
said that you and I have different tastes for the process of taking a -dol- 
lar irom me and giving it to you 

Though modern statisticians do not at all deny the existence of dif- 
ferent tastes in different people, only occasionally do they take that 
difference explicitly into account In particular, the theory of utility 
has scarcely ever entered explicitly into the works of statisticians Our 
intellectual ancestors who believed in the principles of mathematical 
expectation were less tolerant than modern statisticians in so far as 
they denied rationality in those whose tastes departed from that prin- 
ciple, and some of their bigotry is occasionally met with today 

In dealing with multipersonal situations, it is clearly valuable to 
recognize those in which the people involved may all reasonably be 
expected to have the same tastes^ that is, utilities, with respect to the 
alternatives involved in the situation Explicit attempts to discover 
general circumstances under which people’s tastes will be identical are 
rare The most important and fruitful attempt of this sort is repre- 
sented by D Bernoulli’s idea that utility functions will typically be 
approximately linear within sufficiently confined ranges of income 
Consciously or unconsciously, that prmciple is repeatedly appealed to 
throughout statistics, it was, for example, brought out m § 6 5 that the 
very idea of an observation depends for its practical value on Bernoulli’s 
prmciple of approximate linearity 

Relatively inexplicit exploitations of similarity of taste are sometimes 
made in statistics The idea is often expressed, for example, that the 
penalty for making an estimate discrepant from the number to be esti- 
mated will, for everyone concerned, be proportional (within a reason- 
able range) to the square of the discrepancy, an argument for this prin- 
ciple as a rule of thumb appropriate to many contexts will be given in 
§ 15 5 Again, there are situations m which it is agreed that the pen- 
alty will depend only on the discrepancy and not on the true value of 
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the number to be estimated Of course, there are problems in which 
both rules are invoked simultaneously, the penalty being supposed to 
be proportional to the square of the discrepancy and independent of 
the value to be estimated 

Turn now to differences in judgment, that is, to diflcrcnees in the 
personal probability, for different people, of the same event Though 
modem objectivistic statisticians may recognize the existence of dif- 
ferences of judgment, they argue m theoretical discussions that statis- 
tics must be pursued without reference to the existence of those differ- 
ences, indeed without reference to judgment at all, m order that con- 
clusions shall have scientific, or general, validity To put the same 
idea in personalistic terms, I would say that statistics is largely devoted 
to exploiting similarities in the judgments of ceitain classes of people 
and in seeking devices, notably lelevant obseivation, that tend to min- 
imize their differences 

The tendency of obseivation to bring about agreement has been il- 
lustiated m § 3 6 Some of the othei geneial circumstan(‘cs m which 
different people may be expected to agree, or at least neaily agree, m 
some of their judgments have also been mentioned Foi example, it 
may well happen that diffeient people aie faced with partition prob- 
lems that are the same m that the same variable is to be observed by 
each pel son, but differ in that each person has his own a prion proba- 
bilities /3 and his own set of available acts F. If, however, the condi- 
tional distribution of x given is the same for each person, then the 
people will, for example, agiee as to whether a contraction y of x is 
sufficient, which is often of great practical value. Again, there aie cir- 
cumstances under which each ot these same people will agree that cer- 
tain derived acts are neaily optimal 

4 The minimax theory 

In recent yeais there has been developed a theory of decision, here 
with due precedent to be called the mmimax theory, that embraces so 
much of current statistical theory that the remaining chapters can 
largely be built around it The mmimax theoiy was onginated and 
much developed by A. Wald, whose work on it is almost completely 
summarized in his book [W3] Wald's mmimax theory, of couise, de- 
rives from, and leflects the body of statistical theory that had been 
developed by othei s, particulaily the ideas associated with the names of 
J Neyman and E S Pearson It seems likely that, m the development 
of the minimax theory, Wald owed much to von Neumann's treatment 
of what von Neumann calls zero-sum two-person games, which though 
conceptually remote from statistics, is mathematically all but identical 
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With study of the minunax rule, the characteristic featuie of the mini- 
max theory. 

Wald in his publications, and even in conversation, held himself 
aloof from extramathematical questions of the foundations of statistics, 
and therefore many of the opimons expressed in later chapters on such 
points in connection with the minimax theory were neither supported 
nor opposed by him It may fairly be said, however, that he was an 
objectivist and that his work was strongly motivated by objectivistic 
ideas 

My policy here of holding difficulties of mathematical technique to a 
minimum by making strmgent simplifying assumptions will be adhered 
to m connection with the minunax theory. A large part of Wald’s book 
[W3] IS concerned with overcommg the difficulties in technique that are 
here avoided by simplifying assumptions, but that must be faced in 
many practical problems Despite Wald’s able effort, important prob- 
lems of analytic technique still remain in connection with the minimax 
theory It should also be appreciated that the mdividual mathematical 
problems raised by applications of the minimax theory are often very 
awkward, even when stringent sunphfying assumptions are complied 
with, consequently much work on specific applications of the theory is 
still in progress. 



CHAPTER 9 


' Introduction to 
the Minimax Theory 

1 Introduction 

Ttis chapter explains what the mimmax theory is, almost without 
reference to the theory of personal probability This course seems b^t, 
because the theory was originated from an objectivistic point of view 
and as the solution of an objectivistic pioblem Moreover, a philo- 
sophically moie neutral piesentation seems to result, if the ideas of pei- 
sonal probability aic here kept out of the foreground 

The minimax theory begins with some of the ideas with which the 
theoiy of personal probability, as developed in this book, also begins 
In particular, the notions of person, world, states ot the woild, events, 
consequences, acts, and decisions presented in §§ 2 2~5 apply as well 
to the minimax theoiy — from which they weie m fact deiived— as to 
the theoiy of personal probability 

The point at which the two theories depart from each other is § 2 6, 
which postulates that the peison^s preferences establish a simple Older 
among all acts. That assumption is neccssaiily i ejected by objectivists, 
for it, togethei with the sure-thing pimciple (which they piesumably 
accept), implies the existence of personal probability Foi objectivists, 
of couise, conditional probability does not apply to all ordered pairs of 
events More specifically, it seems to be a tacit assumption of objecti- 
vistic statistics that the world envisaged in any one pioblem is parti- 
tioned into events with lespect to each of which the conditional proba- 
bilities of all events (ignoring the mathematical technicality of measura- 
bility considerations) are defined, but that conditional probability with 
respect to sets other than unions of elements of the partition aie not 
defined That, incidentally, is why paitition pioblems dominate objec- 
tivistic statistics The partition in question is in general infinite, but, 
for mathematical simplicity, it will here be assumed to be a finite par- 
tition 

The objectivistic position is not in principle opposed to the concept 
of utility In particular, the minimax theory is predicated on the idea 
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that the consequences of those acts with which it deals are measured 
numerically by a quantity the expected value of which the person 
wishes to have as large as possible, whenever (from the objectivistic 
point of view) the concept of expected value applies It will therefore 
be doing the minimax theory little or no injustice to postulate here, as 
elsewhere, that the consequences of acts are measured m utility 

These preliminaries disposed of, the general objectivistic decision 
problem is to decide on an act f in some given F, by criteria depending 
only on the conditional expectations E{t | J?^), and therefore without 
reference to the '^meaningless'^ 

Taking any personalistic or necessary point of view literally, it is 
nonsensical to pose an obj’ectmstic decision problem, that is, to ask 
which f of F IS best for the person, without reference to the P(B^) ^ On 
the^other hand, many, if not all, holders of objectivistic views, like Wald, 
find themselves logically compelled by two widely held tenets to con- 
sider such problems meaningful First, for reasons I have alluded to in 
Chapter 2 and will soon expand upon, many theoretical statisticians 
today agiee, at least tacitly, that the object, or at any rate one object, 
of statistics is to recommend wise action in the face of uncertamty — a 
point of view that Wald was particularly active in bringmg to the fore. 
Second, statisticians of the British-Amencan School, of which Wald is 
to be considered a member, are objectivists and are therefore committed 
to the view that the probabilities P(B^) are meaningless, or, at any 
rate, that they cannot be legitimately used in solutions of statistical 
problems 

So far as I know, Wald is the only one ’who has proposed any solution 
to the general objectivistic decision problem, barring minor variations 
His proposal, which is here called the mmimax theory, is rather compli- 
cated to state In view of its complexity and the importance of this 
theory for the rest of this book, and for statistical theory generally, I 
hope the reader will have particular patience with the present chapter 

2 The behavioralistic outlook 

Prior to Wald's formulation of what is here called the objectivistic 
decision problem, the problems of statistics were almost always thought 
of as problems of deciding what to say rather than what to do, though 
there had already been some interest in replacing the verbalistic by the 
behavioralistic outlook The first emphasis of the behavioralistic out- 
look in statistics was apparently made by J. Neyman m 1938 m [N3], 
where he coined the term "inductive behavior" in opposition to "in- 
ductive inference " In the verbalistic outlook, which still dominates 
most everyday statistical thought, the basic acts are supposed to be 
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assertions, and schemes based on observation aie sought that seldom 
lead to false, or at any late grossly inaccurate, assertions 

The verbahstic outlook in statistics seems to have its origin m the 
veibalistic outlook m probability criticized in § 2 1, which in turn is 
traceable to the ancient tradition in epistomology that deductive and in- 
ductive inference are closely analogous processes 

I, and I believe others sympathetic with Waldos woik, would analyze 
the verbahstic outlook in statistics thus* Whatever an assertion may 
be, it IS an act, and deciding what to assert is an instance of deciding 
how to act. Therefore decision problems foimulated in terms of acts 
are no less general than those formulated in terms of assertions 

If, on the other hand, a sufficiently broad interpretation is put on the 
notion of assertion, perhaps every decision to adopt an act can be re- 
garded as an assertion to the effect that that act is the best available, 
m which case the difference between the verbahstic and the behavioial- 
istic outlooks IS only terminological, but I do think that, even under 
such an interpretation, the bchavioiahstic outlook with it.s tendency 
to emphasize consequences oflers the bcf.ter terminology 

Fallacious ai-tempts to analyze away the diffeiencc between the vei- 
balistic and behavioralistic viewpoints are also sometimes put forward, 
especially in informal discussion For example, it is sometimes said 
that one should act as though his best estimate of a quantity were in 
fact the quantity itself But on that basis few of us would buy life 
insurance fot next year, lor we do not typically estimate the year of 
our death to be so (‘lose Other examples are discussed by Carnap in 
Section 50 of [Cl] 

If asseitions are, indeed, to be interpreted as a special class of acts 
of particular impoitance to statistics, I have no clear idea wliat that 
class may be, but it would jiresumably exclude certain acds, such as the 
design of an experimcmt, that surely arc of importance to st-atistacs 
Actually the verbahstic outlook has led to much confusion m the foun- 
dations of statistics, because the notion of assertiion has Ixhhi uschI in 
several different, but always ill-defuuxi, senses, and because' emphasis 
on assertion distracts from the indispensable concept of consecjuences 
I conclude that the behavioralistic outlook is clearer, fuller, and bettei 
unified than the veibalistic, and that such value as any veibalistic con- 
cept may have it owes to the possibility of one oi more behavioialistic 
interpretations 

This analysis is really too brief and must be supplemented by certain 
remarks To begin with, the reader may wonder wliethei the verbahstic 
outlook has adherents who defend it against the behavioialistic, and if 
so what their arguments may be Actually, the statistical public seems 
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to greet the behavioralistic outlook as a relatively new idea — ^how old 
it may actually be is beside the pomt here — ^which as such must be re- 
gal ded with some skepticism To the best of my knowledge, however, 
only one objection agamst the behavioralistic outlook has been pre- 
sented It must be discussed next 

It has been seen as an objection to the behavioralistic outlook that 
the consequences of some assertions, particularly those of pure science, 
are extremely subtle and difficult to appraise As a function of the tnie 
but unloiown velocity of light, what, for example, wiU be the conse- 
quences of asserting that the velocity of light is between 2 99 X 10^^ 
and 3 01 X 10^^ centimeteis per second? But, if some acts do have 
subtle consequences, that difficulty cannot properly be met by denying 
that they are acts or by ignoring their consequences Certain praf^tical 
sohitions of the difficulty are known For example, considerations of 
symmetry or continuity may, as is illustrated m Chapters 14 and 15, 
make a wise decision possible even in some cases where the explicit 
consequences of the available acts are beyond human reckoning Again, 
analysis sketched in the next two paiagraphs tends to show that asser- 
tions with extremely subtle consequences play a smaller role in science 
and other affairs than might at first be thought 

No woiker would actually publish — indeed no journal would accept 
— as research the hypothetical assertion about the velocity of light men- 
tioned in the paragraph above The consequences might be subtle, if 
he did, but they would not be very important, for no one would take 
him seriously An actual worker would do as much as was practical 
to say what observations relevant to the velocity of light he, and per- 
haps others, had performed and what had been observed To be sure, 
his statement of the observations would typically be much condensed, 
he would resort to sufficient statistics or other devices to put his reader 
rapidly in position to act as though the reader himself had made the 
observations Assertions about the velocity of light, and countless 
others of that sort, are of course published in textbooks and handbooks 
These assertions do indeed have complicated consequences, so judgment 
IS called for in the compilation of such books, but the seriousness of the 
consequences of their assertions is limited because of the possibility of 
referring to origmal research publications, a possibility serious text- 
books and handbooks facilitate by the inclusion of bibliographies 

On the other hand, it is obvious that many problems described ac- 
cording to the verbalistic outlook as calling for decisions between asser- 
tions really call only for decisions between much more down-to-earth 
acts, such as whether to issue single- or double-edged razors to an army. 
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how much postage to put on a parcel, or whether to have a watch re- 
adjusted 

It is time now to turn back to objectivistic decision problems 
3 Mixed acts 

Speaking with pedantic strictness, it might be said that Wald does 
not propose a solution for the general objectivistic decision problem, 
because, before undertaking a solution, he insists that F be subject to 
a certain condition On the other hand, he argues that the condition 
IS typically met in practice, he might fairly have insisted that it is the 
very heart of much actual statistical practice Before discussing the 
issue in detail, let me give a small but typical illustration of it. 

Suppose that in a rental library I am confronted with the choice be- 
tween two detective stories, each of which looks more horrifying than 
the othei At first sight it would seem that only two acts are open to 
me, namely, to rent one book or the other, but Wald points out that 
there arc other possibilities, not ordinarily thought of as such In par- 
ticular, I can eliminate one of the books by flipping a com More accu- 
rately and more generally, I can let my choice depend on the outcome 
of a random vaiiable that is utterly irrelevant to the fundamental pai- 
tition — in this example, a random vaiiablc the outcome of which is in- 
dependent ol the relative meiits of the two books The random varia- 
ble may as well be confined at the outset to two values corresponding to 
the lental ot one or the other of the books, and landom variables as- 
signing the same piobabilities to the books aie equivalent for the pur- 
pose at hand In practice, especially seiioiis statistical piactice, such 
landom variables are, taking leasonable precautions, leadily provided 
by coins, caids, dice, tables of random niimbeis, and othei devices 

In terms of the general objectivistic decision problem, Waldos point 
can (except foi mathematical technicalities) be foimulated thus. If 
represents a finite number of elements of F, and is a coiiesponding 
set of non-ncgativc numbers such that S<^(r) = J, then the person can 
make the mixed act 

(1) f=E<^-(Ofr 

r 

available to himself by observing at no appreciable cost a random varia- 
ble taking the values r with coi responding probabilities <^(r) irrespec- 
tive of which obtains, so F may be assumed to include f Techni- 
cally, the sum in (1) should, for full generality, be replaced by an inte- 
gral with respect to a probability measure But such integi als become 
superfluous under the simplifying asssumption, which is heiewith made. 
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that there are in F a finite set of acts fr, to be called primary acts, with 
respect to which every act in F can be represented in the form (1). In 
the rental-library example, the two acts corresponding to the two books 
can be regarded as primary. 

Since mixed acts are also available from the personalistic point of 
view, it may well be asked whether it is advantageous to consider them 
in connection with that point of view, and, if not, how they can be of 
advantage from one pomt of view but not the other The answer io 
the first pait of the question is easy. Indeed, if f is defined by (1) then 
it IS personalistically impossible that f should be definitely pi ef erred to 
every f,., that is, that 

(2) E{f) = E <t>ir)E(fr) > max EiU), 

r r 

m 

for a weighted mean cannot be greater than all its terms Technical 
explanation of the efficacy of muxed acts from the objectivistic point of 
view can best be presented after the whole statement of the minimax 
rule, but those at all familiar with modern statistical practice will de- 
rive some insight from the remark that the usual preference of statis- 
ticians for random samples represents a preference for certain mixed 
acts 

4 Income and loss 

It IS sometimes suggestive, and in conformity with some statistical 
(though not quite with economic) usage, to refer to E{i\ B^) as the 
income of f when obtains, and, correspondingly, to use the notation 
J(f , z) An important concept associated with the income is that which 
I shall refer to as the loss (symbolized by L(fj ^)) mcurred by the act f 
when B^ obtains By that I mean the difference between the mcome 
the person could attain if he were able to act with the certain knowledge 
that B^ obtained and that which he will attain if he decides on f when 
B^ does in fact obtain Formally, 

(1) L(fj i) = Df max Z(f, i) — /(f ; ^) 

f' 

If the person decides on f when obtains, L(f , ^) measures m terms of 
income the error he has made. If he were himself informed of B^ after 
f had been chosen, which is not typically the case, L(f;i) would, so to 
speak, measure his cause for regret On that account, some have pro- 
posed to call loss “regret,^^ but that term seems to me charged with 
emotion and liable to lead to such misinterpretation as that the loss 
necessarily becomes known to the person On the other hand, the 
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term “loss” has been used by Wald m the sense of negative income, 
but in contexts where loss as defined here is, of the two senses, the only 
defensible one, as will be explained in § 8. I hope the sense proposed 
here will not cause seiious confusion 

Exercises 

A For each ^, there is at least one primary act fr such that 

(2) /(fr, ^) = max J(f, ^) 

f 

Such a primal y act may fairly be called correct for ^ 

2 L{i, i) = S0(r)L(fr; %) > 0, equality holding if and only if f is a 

r\ 

mixture of acts correct for i ^ 

3 L(f, 0 = max/(fr', ^) — /(f; /) 

r' 

4. L(f, i) == — /(f, 'i), if and only if 

(3) max /(f, , 'i) = 0 

r 

6 The minimax rule, and the principle of admissibility 

The most characteristic feature of the minimax theory is a certain 
mlo of behavior, or recommendation to the person This rule, to 1)0 
called the minimax rule, can now be foimulated thus Decide on an 
act f such that 

(1) max L(f ; ^) == mm max L(f , i), 

% f ^ 

wheie f and f' aie, of course, confined to F 

In words, the minimax lule recommends the choice of such an act 
that the greatest loss that can possibly accrue t.o it sliall be as small as 
possible An f satisfying the lecommendation of the minimax rule will 
be called a minimax act, and the greatest loss that can accrue to a mini- 
max act will be called the mmimax value of the (objectivistic) decision 
problem and wiitten L* Under the simplifying assumptions that have 
been made, it is not technically difficult to show that at least one mmi- 
max act exists The statement of the rule can be reasonably extended 
to mathematically more general situations, but a digression about this 
possibility is not appropriate here. The name of the rule is presumably 
deiived from the abbieviation “min max” m (1) or from the Latin 
phrase “minimum maximorum” thus abbreviated. 
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It may well happen that F contains more than one act that is mini- 
max for the problem, in which case the minimax rule recommends, not 
a particular act, but only that the choice be narrowed to the set of 
mimmax acts Some other critenon must then be invoked to narrow 
the choice further In particular, it can be shown that at least one of 
the mimmax acts is admissible, in the sense of § 6 4 As Wald indicates, 
it would, therefore, be an inexcusable violation of the sure-thing prin- 
ciple not to narrow the choice to admissible acts This application ’of 
the sure-thmg pnnciple will be called the principle of admissibility. 
The mimmax rule and the principle of admissibility constitute the sub- 
ject matter of, and thereby define, the minimax theory. 

6 Illustrations of the minimax rule 

flit would be hard to imagine an objectivistic decision pioblem simpler 
than that of whether to make an even-money (or more accurately, even- 
utility) bet in favor of a ceitam event or to refram from betting That 
problem, therefore, provides a convenient first example of the mimmax 
rule and the concepts associated with it Supposing, as one may with- 
out loss of generality, that the bet is for one utile, the objectivistic de- 
cision problem is completely described by Table 1, which gives the in- 

Table 1 The income of an even-money bet, /(f^, ^) 



Event 

Act 



Bi 

B2 


Bet, fi 

1 1 

-1 

Don’t bet, f 2 

0 

0 


come of each of the two primary acts for each of the two elements of 
the partition corresponding to the event in question and its com- 
plement 

In view of Exercises 4 2 and 4 3 the corresponding loss function is 
described by Table 2 Therefore, 


( 1 ) 


max L(f , ^) = max S<^>(r)L(fr, t) 

t I 

= max (j>{'i) > I, 
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equality obtaining if and only if <^>(1) == 4>(2) = J Theiefore, D* == 
and the only minimax act is f = ffi + 4^2* 

Table 2 The loss of an even-money bet, L ( fr , ^) 



Event 

Act 



Bi 

B2 


fi 

0 

1 


1 

0 


In this problem, therefore, the minimax rule recommends that the 
person decide, m effect, by flipping a fair com. If the odds in the bet 
had not been even, the minimax lule would have iccommended the 
use of a coin with a certain bias, this more geneial example will be 
worked out in detail in § 12 4 It is noteworthy m connection with the 
present problem — for it happens in many others — that, for the minimax 
act f, L(f , z) = L* foi evciy value of ^ 

The following moie elaborate example, illustrating tlie mechanism of 
observation, is paraphrased from a slightly incorrect example m [S2] 
Of three numbered coins, two aie pennies and one is a dime, or else one 
is a penny and two arc dimes This gives use to a sixfold partition 
because any of the three coins may be the singular one, and m two ways. 
The available pi unary acts are described m two stages thus First, the 
person may select one of the coins by number for observation, or he 
may refrain from so doing, second, he must guess at the denomination 
of the singular com His income m utiles is defined by the following 
conditions 

1. If the singular com is a penny, he must pay a tax of 10, if it is a 
dime, he receives a bonus of 20 

2. If he chooses to obscive a com, he must pay an inspection fee of 
1, regardless of the paiticular coin selected for observation 

3 If his guess is mcoirect he pays a penalty of 8 

It is easy to see that the first of the thiec teims in the person’s in- 
come IS irrelevant to his loss, since his decision does not affect the mag- 
nitude of that teim His loss is therefore the sum of two teims The 
first of these is 1 oi 0 depending on whethei he decides to make an ob- 
servation, the second is 0 oi 8, depending on whether his guess is conect 
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If the person chooses not to pay the inspection fee, it is clear from the 
preceding example that, no matter what he does, his loss may be as 
high as 4, and that it is certam to be that small if and only if he governs 
his guess (essentially) by the flip of a fair com 

Suppose next that the person decides to make an observation If 
he selects any particular com for observation, he is as badly off as he 
was before the observation, and he has in addition mcurred the inspec- 
tion fee Thus, even if the peison knows that the first com is a penny, 
there is nothing he can do to be sure that his total loss will not be more 
than 5, and, as before, he can guarantee that small a loss only by govern- 
ing his guess with the flip of a fair com 

I think every practicmg statistician would say that, if an observation 
is to be made at all, one of the three coins should be selected at random 
(i g , the probability 1/3 should be attached to observing each of tfiem) 
and after the observation the person should guess that the singular 
coin IS opposite in denomination to the one observed It will be shown 
in the next paragraph that this common-sense act is minimax 
In the first place, the loss L(fo, for the act fo m question is, for each 
equal to 1 + | X 8 = 3§, which is less than 4, for the inspection fee 
IS 1 and the probability of making a wrong guess, which would result 
m the loss of 8, is 1/3 To show that fo is mimmax, it will be enough to 
show that every act can result in a loss of at least 3f One possibility 
for doing this (which in § 12 3 will be shown to be a natural one to try) 
IS to show that, for a certam set of weights, the weighted average of 
L(f ; t) with respect to z is at least 3f for all f In fact, it is sufficient, 
m view of Exercise 4,2, to establish such an inequality for the primary 
acts In the present example, it happens that the weights can be cho- 
sen to be equal. What is to be shown, then, is that the following in- 
equality obtains for every primary f 

(1) L(f) >3§ 

I 

Now, if the primary act f does not involve observation, L(f) - 4, be- 
cause three of the six teims to be averaged are then 8, and the other 
three are 0 Suppose next, for definiteness, that f mvolves the obser- 
vation of the first com, there are then three possibilities to consider 
First, the guess is made without regard for the denommation observed, 
in which case the observation is, so to speak, thrown away, making 
L(f) — 5 Second, the denommation guessed may be the same as the 
denomination observed, m which case the guess will be wrong for four 
of the SIX values of z, making L(f) = 64 Fmally, the denomination 
guessed may be the opposite of the one observed, m which case the guess 
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Will be wiong foi two of the six values of making L(f) = 3f This 
argument shows that L* > 3f ; and, since L(fo, = 3| for evciy fo 
IS a mmimax act and Z/* = 3| It would not be difficult to show that 
fo IS the only minimax act for this problem 

7 Objectivistic motivation of the minimax rule 

The minimax lulc lecommends an act foi the pctson i.o choose, moie 
sfiictly, it recommends a shaip nairowmg of his choice But how can 
this particular recommendation be motivated? To the best of my 
knowledge no objectivistic motivation of the minimax rule has ever 
been published In particular, Wald in his woiks always fiankly put 
the rule forward without any motivation, saying simply that it might 
appeal to some Though my heart is no longer in the objectivistic point 
of view, I will in the next few paragraphs suggest a i datively objecti- 
vistic motivation of the rule 

I evolved this far from satisfactory argument at a time when I took 
the objectivistic view foi granted Now, as a personalist, it sthl seems 
Intel cstmg to me in that it shows, or at least suggests, how statistical 
devices combat vagueness, a topic I find very difficult to discuss di- 
rectly On a diffeient level, the argument may shed light on the per- 
sonahstic view by suggesting how pcrsonalistic ideas entered the mind 
of at least one objectivist 

A categorical defense of the mmimax rule seems definitely out of the 
question Suppose, for example, that the person is offered an even- 
money bet for five dollars — or, to be ultuwigorous, foi five utiles — 
that internal combustion engines in American automobiles will be obso- 
lete by 1970. If theie is any event to which an objectivist would refuse 
to attach probability, that coiiespondmg to the obsolescence in ques- 
tion is one As the example centering around Tables G.l~2 makes clear, 
the mmimax rule recommends that the bet be taken or i ejected accoid- 
ing as a fair coin falls heads or tails Yet, I think I may say without 
presumption that you would legard the bet against obsolescence as a 
very sound investment, agreeing that piovision for adequate interest 
and compensation for changes in the value of money is implicit in meas- 
urement of income in utiles 

On the other hand, there are practical circumstances in which one 
might well be willing to accept the rule — even one who, like myself, 
holds a personalistic view of probability It is hard to state the cir- 
cumstances precisely, indeed they seem vague almost of necessity 
But, roughly, the rule tends to seem acceptable when L* is quite small 
compared with the values of L(f , i) for some acts f that merit serious 
consideration and some values of i that do not in common sense seem 
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nearly incredible Suppose, for example, that I were faced with such 
a decision problem, in which it may be assumed for simplicity that there 
IS only one mmimax act f, and consider how I might defend the choice 
of that act to someone who proposed another to me He might, for 
example, tell me that he knows from long experience, or by a tip from 
his broker, that some act g is preferable to f “Well,’’ I might say, “I 
have all the respect in the w^oiid for you and your sources of informa- 
tion, but you can see for yourself — ^for it is objectively so — that the 
most I can lose if I adopt f is L* ” He will not be able to say the same 
for g, and in many actual situations the greatest possible loss under g 
may be many times as great as L* and of such a magnitude as to make 
a serious difference to me should it occur, which may well end the argu- 
ment so far as I am concerned 

it IS of interest, however, to imagine that my challenger presses me 
more closely, reminding me that I am a believer m personal probability, 
and that in fact I myself attach an expected loss L to g that is several 
times smaller than L* Even then, dependmg on the circumstances, I 
might answer franldy that in practice the theory of personal probability 
is supposed to be an idealization of one’s own standards of behavior, 
that the idealization is often imperfect m such a way that an aura of 
vagueness is attached to many judgments of personal probability, that 
indeed in the present situation I do not feel I know my own mind well 
enough to act definitely on the idea that the expected loss foi g really 
IS L; but that I do, of course, feel perfectly confident that f cannot re- 
sult m a loss gi eater than L’*', a prospect that in the case at hand does 
not distress me much 

It seems to me that any motivation of the minimax piinciple, ob- 
jectivistic or personalistic, depends on the idea that decision problems 
with relatively small values of L* often occur in practice The mecha- 
nism responsible for this is the possibility of obseivation. The cost of 
a particular observation typically does not depend at all on the uses to 
which it IS to be put, so when large issues are at stake an act incorporat- 
ing a relatively cheap observation may sometimes have a relatively 
small maximum loss In particular, the income, so to speak, from an 
important scientific observation may accrue copiously to all mankind 
generation after generation. 

8 Loss as opposed to negative income in the minimax rule 

As a variant to the mmimax rule as I have stated (or perhaps I should 
say interpreted) it, one might consider the possibility of letting the 
negative of income play the role of the loss in (5 1) Indeed, strictly 
speaking, Wald himself always proposed the minimax rule in that 
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form. I believe he never made written allusion to the rule formulated 
in terms of loss (as ^doss^^ is defined here) , orally he took the position 
that loss and the form of the minimax rule based on it were inventions 
of mine, toward which he was tentatively sympathetic Theie is vir- 
tually no mathematical difference between the two lules, and it was 
characteristic of Wald's approach to the foundations of statistics to be 
reluctant to commit himself with respect to any other differences 

Though the mimmax rule founded on the negative of income seems 
altogether untenable, as will soon be explained, and though no one but 
myself seems to question that I originated the variant of the theory 
based on loss, little or no originality is attributable to me m this re- 
spect Wald more than foreshadowed the idea, for, though he based 
his ;nimmax rule on the negative of income, he made it clear in publica- 
tions, including [W3], that he regarded as typical problems in which 
the income has, for every z, the property specified in Exeicise 4 4 
Therefore, m the situations Wald regarded as typical, the distinction 
between the two forms of the rule vanishes, so, until hearing his ex- 
plicit disavowal, I considered the idea of loss as opposed to negative 
income his 

To see that the mimmax rule founded on the negative of income is 
utteily untenable for statistics, consider, for example, a twofold parti- 
tion problem with two primary acts m which the income is as m Table 1 


Table 1 /(f,, 0 



Event 

Act 




Bz 



-i 

-1 

u 

-10 

1 


Now, if the pel son were interested in minimizing the maximum of the 
negative income, he would have no recourse but to decide on fi, in which 
case (but in no other) he could be sure that the negative income would 
be at most 1, whichever obtained This may not in itself seem ob- 
jectionable, but suppose now that the person has available free of cost 
an observation, however relevant to Then, no matter what derived 
act he chooses, if Bi obtains, his negative income will be at least 1 
utile; and, to be sure that it is not more, he again has no recourse but 
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to decide on f i In short, for the problem at hand, the personas behavior 
would not be influenced by any observation, however relevant This 
seems to me absurd on the face of it, but perhaps the absurdity can be 
brought out by a less abstract situation parallehng the example just 
given A person has a ladder, and, just as he is about to use it, it oc- 
curs to him that the ladder may possibly be dangerously defective 
He envisages two basic piimary acts fi, to throw the ladder away and 
buy a new one, which will cost 1 utile in either event, and f 2 , to use the 
ladder, which will, if the laddei is defective, result in his mjury to the 
extent of 10 utiles, and will, if the ladder is sound, accomplish his ob- 
ject, which IS worth 1 utile Now, if the person acts on the prmciple of 
minimizing the maximum of negative income, he will throw the ladder 
away, no matter what tests tend to show that it is sound ^ 



CHAPTER 10 


A Personalistic Reinterpretation 
of the Minimax Theory 

1 Introduction 

In this chapter a leinterpretation of the mmimax theory, based on 
the theoiy of personal probability and the idea that statistical pioblems 
are typically multipcrsonal, is tentatively put forward The reinter- 
pretation IS based on a model or scheme that captures, I believe, much 
of the essence of actual statistical situations, but it may be possible to 
effect that end with other equally simple and even more lealistic models, 
for the one to be presented here leaves much to be desired In struc- 
ture, this chapter is kept roughly parallel with Chaptei 9, to enable the 
reader to examine as closely as he may wish the parallelism between the 
objectivistic interpretation given there and the personalistic one given 
here In paiticular, the liberty is taken of giving old symbols new mean- 
ings in order to bring out the parallelism between the two interpreta- 
tions 

2 A model of group decision 

Consider a group of people, indexed by numbers ^ These people aie 
supposed to have the same utility function, at least for the consequences 
to be considered in the present context, but their personal probabilities 
aie not necessarily the same The group of people is placed m a situa- 
tion in which it must, acting in concert, choose an act f from a finite 
set of available acts F, the consequences of the acts bemg measured in 
terms of the common utility of the members of the group 

The situation just described will be called a group decision problem. 
It is epitomized by a jury The members of the jury, in legal theory, 
are supposed to have common value judgments m connection with the 
legal matters at hand, for these are incorporated in the law as stated 
m the instructions of the court But it is part of the very concept of a 
juiy that its membeis may be of different opinions, that their judgments 
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as to questions of fact may differ, that, to put it technically, they may 
have different systems of personal probability Still other situations 
resembling the group decision problem are widespread m science and 
industry, though the group decision problem does by no means repre- 
sent the only soit of social interaction tending to make the theory of 
personal piobability, conlSbaed to a single person, madequate When- 
ever a hospital or a factory modifies its procedures, whenever a doctnne 
IS adopted with little reservation by virtually all the workers in a 
science, or whenever a panel of experts drafts a report, something like 
group decision is taking place 

Since the members of the group m a group decision problem, though 
required to act m concert, t3q)icaUy differ from one another in their 
probability judgments, it is too much to expect that any rule cap, be 
formulated that wall be acceptable to, or m any sound sense proper for, 
all groups under all circumstances. On the other hand, there may be 
one or more rules of thumb that wall lead the group to an acceptable 
compromise in many practical circumstances Two such suggestions, 
the group mmimax rule and the group principle of admissibility, will 
be made and exploied in the next section 

3 The group mmimax rule, and the group principle of admissibility 

In the first place, the possibility of using mixed acts is to be pomted 
out If, for example, you and I, walkmg together, disagree about w^hich 
branch of a fork in the road leads home, we can, and in fact may, de- 
cide which to try by flipping a com 

In general, mixed acts are available m a group decision problem for 
reasons analogous to their availability in objectivistic decision prob- 
lems, for, though the members of a group may generally differ in the 
probabilities they personally assign to some events, there is in practice 
an abundance of events associated with coins, cards, random numbers, 
and the like that make it possible for the group to mix the primary acts 
in any proportion, all members of the group bemg in agreement about 
what the proportions aie The example of the fork in the road illus- 
trates how the use of mixed acts can effect such a compromise as to 
make decision possible in what might otherwise be an impasse As in 
the account of the objectivistic decision problems, it wall therefore be 
taken for gi anted from now on that F contams all mixtures of its ele- 
ments, and once more, for mathematical simplicity, it wall be assumed 
that there are a finite number of primary acts fr in F, of which all 
others are mixtures 

The ^th person m the group attaches a certain expected utihty, or 
(personal) income, to the act f; call it /(f, i) In the judgment of the 
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^th person, adoption of the act f would represent a (personal) loss, 

(1) L(f, t) = max J(f, ^) - /(f, ^) 

f' 

(possibly zeio) as compared with the income or expected utility that 
in his opinion would result from an act he considers most promising 
The group minimax rule is the suggestion that an act be adopted 
such that the largest loss faced by any member of the group will be as 
small as possible To put it formally, the suggestion is that an f' be 
adopted such that 

(2) max L(f', = L* = mm max L(f, ^) 

t ft 

The parallelism between the group minimax rule and the mmimax ji:ule 
stated in § 9 5 is great In particular, (2) is identical in appearance 
with (9 5 1) This is really only a pun, though a fruitful one, because 
L, i, and even f have altogethei different meanings in the two contexts 
As indicated at the outset, it cannot be expected that the group mini- 
max rule will, or reasonably should, be accepted by every group faced 
with every problem But, much as m the coi responding objectivistic 
decision problems, it may happen that, if L* is small, in a rather vague 
sense, the group will accept the group mmimax rule Indeed, if L* is 
small, the group mmimax rule requires no member of the group to face 
a large loss, so no member will feel that the suggestion is a serious mis- 
take In any event, no member of the group can suggest an alternative 
that will not make some member’s loss as great as L*, for there is none 
Moreover, in many problems the group mmimax rule will lead to the 
same loss L* for every membei of the group (as is explained in § 12 3), 
a ciicumstance which, when it occurs, may add to the acceptability of 
the suggestion by making it seem fair 

Of course it is possible that, as m the objectivistic interpretation, 
more than one act fulfilling the minimax principle exists Here, a para- 
phrase of the principle of admissibility will further narrow the choice, 
for if 

(3) L(g, 0 < L(f , ^) 

for eveiy with inequality obtaining for some z, the group cannot seri- 
ously consider f 

4 Critique of the group minimax rule 

Some of the criticisms that have been, or may be, raised against the 
mmimax rule can as well be discussed in connection with one mterpre- 
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tation as with the other, and Chapter 13 will be devoted to such criti- 
cisms But some that bear specifically on the multipersonal interpre- 
tation in this chapter should be discussed here 

In the first place, the group mimmax rule is flagrantly undemocratic 
In particular, the influence of an opinion, under the group mimmax rule, 
IS altogether mdependent of how many people m the group hold that 
opinion In general, it is difficult to give a foimal analysis of the concept 
of democratic decision, a point discussed at length by Arrow [A5], Hil- 
dreth [H4a], and others Perhaps, considenng that the people in the 
group are postulated to have a common utility function, a satisfactory 
analysis of democratic decisions could be given in the case of a group 
decision problem by some such procedure as nainimizmg the average 
with respect to z of L(f , z). But, in many situations in which I envisage 
application of the group mmimax principle, the group will in fact &e a 
rather nebulous body of people, for example the group of all speciahsts 
m some field. The principle would m such a case be administered by a 
single member of the group somewhat in the following fashion. In 
planning an investigation, the results of which he intends to publish, 
he will endeavor to take account of all opinions, so far as he can know 
or guess them, that are considered at all reasonable m his field of m- 
vestigation And when he publishes his conclusions he will say, m 
effect, ^^Whatever leasonable opinions have heretofore been held by 
members of this specialty, in the light of my investigation and the mm- 
imax rule, it is now proper for the members of the specialty, m so far 
as they are called upon to act in concert, to agree to such and such an 
action ” To put it a little differently, in such an application the group 
IS rather fictitious, and the individual mvestigator is admitting as rea- 
sonable a rather large class of opinions, but excluding many that he 
IS sure his confreres will agree are utterly absurd He vnll, for example, 
feel quite free to exclude those opmions that almost all educated people 
regard as superstitious 

The group mimmax rule is also objectionable in some contexts, be- 
cause, if one were to try to apply it m a real situation, the members of 
the group might well he about their true probability judgments, in 
order to influence the decision generated by the mmimax rule in the 
direction each considers correct This objection is, however, scarcely 
serious in the fictitious sort of application suggested above 

It IS appropriate, in termmating this section, to discuss a certain dis- 
tmction, neglect of which can, as was pointed out to me orally by Bruno 
de Finetti, lead to serious misunderstanding of the group mimmax rule 
Voluminous observation typically tends to make any one person almost 
certain of the truth, and also, when a group of people is involved, it 
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typically tends to make L* small These two tendencies, though re- 
lated, are separate phenomena, as an illustration will bring out 

Suppose that Peter and Paul are lequiied to bet 1 utile in concert 
either that the majority of a large electorate has voted for, or that it 
has voted against, a ceitam issue, but that before betting they aie to 
be allowed to examine a random sample of 1,001 ballots 

If specific opinions about the division of the electoiate are assigned 
fo Peter and Paul, the situation can be legarded as a group decision 
problem To start with an interesting extreme possibility, suppose 
that it is PetePs unequivocal opinion that 55% of the electorate is for 
and 45% is against the issue and Paul’s that the division is 45% for 
and 55% against, that is, Peter, for example, is supposed to act as 
though he Inows that the division is 55%-45% 

If, finally, it is understood that the group decision problem con^iists 
in the two people, Peter and Paul, deciding, before the sample is ac- 
t.ually observed, how their bet is to be detei mined by the composition 
oi the sample, then the unique minimax act is to bet that the electorate 
majonty is whatever the sample majority happens to be Granting 
this easily established solution of the mmimax pioblem, it is obvious 
that the two people both face the mmimax loss L ^ Petei, to be specific, 
icgards as the piobability that thiough landom fluctuation the sam- 
ple will accidentally fail to corroboiate his ^flcnowledge” that the ma- 
joiity IS for the issue Numerically, L* is about 0 0008 

Petci and Paul, lecogmzmg that the possibility of obseiving the 
sample reduces the mmimax loss to about 0 0008 as compared with the 

0 5 that it would be if no sample were available, may well find the mm- 
imax act a satisfactoiy compromise, at any late, it is hard to see m 
this situation how they could arrive at any other 

Though the mcoipoiation of the sample into the problem has gieatly 

1 educed L*, observation of the sample does not affect the opinion of 
eithci pel son m the slightest, for unequivocal opinions such as they 
hold arc not subject to modification m the light of evidence At least 
one of the t wo people is immovably wrong, and the observation of no 
sample, howevei large, can bung them both close to the truth This 
brings out a contiast between the i eduction of L'*' and the approach to 
certainty of the tmth, both of which typically occur with the accumu- 
lation of evidence 

The same contiast is expressed by remarking that, though the two 
people may readily adopt the mmimax act, each feeling that at the ex- 
pense of a small risk he is diverting the obstinacy of his colleague to 
their common good, after the observation of the sample, one or the 
other of them is bound to feel that the prize has been lost by a sad 
and improbable accident 
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The wary will ask, “Who will feel how, when the actual majority is 
disclosed and settlement made‘s WTiat if Peter^s unequivocal opmion 
turns out to be false*?” Such questions suggest that paradox lurks m 
an example in which different people unequivocally hold mutually in- 
consistent opinions, so there is some interest m considermg a modifica- 
tion of the example, free of that objectionable feature 

Suppose then that Peter and Paul, though strongly opinionated about 
the division of the electorate, are not absolutely unequivocal in theii 
opinions. To be quite definite, suppose that Peter attaches probability 
to the division 55%-45% and probability to the divi- 

sion 45%-55%, and that Paul attaches the same probabilities but in 
the opposite order to the two divisions Here, as in the example of the 
unequivocal opinions, the unique minimax act is to let the bet be chosen 
m ^cordance with the sample majority, L* is a trifle lower than before 
Observation of the sample does now generally affect the opinions of the 
two people, but, though it ladically reduces the mmimax loss, it does 
not typically bring the two people into close agreement If, for ex- 
ample, the division is m fact 45%-55%, Paul’s strong a priori belief 
that that is the actual division is almost sure to be strengthened by the 
sample, and Peter’s equally stiong but false belief is almost sure to be 
weakened Still, the probability is only about 1/2 that Peter wiU be 
led by the sample to attach an a posteriori probability even as great 
as 0 05 to the actual division Thus, speaking loosely, but I think prac- 
tically, the approach to certainty of the tiuth is here not typically 
nearly so far advanced by observation as is the reduction of the mini- 
max loss 

It may not be superfluous to point out that the preceding paragraph 
alludes not only to the two different personal probability systems of 
Peter and of Paul, but also to certain conditional probabilities that 
you and I have accepted hypothetically m settmg up the example. 

WTiichever division does actually obtain, it is rather probable that, 
once the sample is observed, either Peter or Paul will wish he could 
break his contract This seems to me to reflect a serious objection to 
the group mmimax principle, especially in those applications in which 
the members of the group are not literally consulted, for people carmot 
be expected to abide by disappointing contracts they might have made 
but didn’t 

For other approaches to the group decision problem see de Finetti 
[D6] and [D7a] 



CHAPTER 11 


The Parallelism between 

the Minimax Theory and 

the Theory of Two-Person Games 

1 Introduction 

John von Neumann, m 1928 [V3], developed a theory of games m 
which two people play each other for money f This theory is mathe- 
matically so closely akin to that of the minimax rule and has had such 
influence on its development that it would be artificial to give an expo- 
sition of the minimax rule without saying something of the theory of 
what von Neumann calls zero-sum two-person games, though the ac- 
count given heie must necessaiily be highly compressed The most 
convenient references m English to the theory of zero-sum two-person 
games, should the reader be interested m a fuller account, are [B18], 
[M3], and Chapteis II and III of [V4], though those who read German 
may find it best to start with the expositoiy sections of the paper [V3] 
in which von Neumann first discussed the subject 

The sort of systematic punning by which the formal paiallelism be- 
tween the objectivistic and personalistic minimax theoiies was empha- 
sized in Chapter 10 will be used once more, to bring out the formal 
parallelism between those theories and that of zero-sum two-person 
games Logic will be still further sacrificed to clarity and convenience 
by calling the two people who play the game “you” and “I.” 

2 Standard games 

A certain sort of game, here called a standard game, is defined thus 
You secretly choose a number r from a finite set of possibilities, and I 
secretly choose a number ^, also from a finite set of possibilities The 
numbers r and ^ having been chosen, you pay me the sum of money 
(possibly negative) L(r, z), where L is an arbitrary function of r and z, 
known to both of us It is assumed that, for the sums involved, each 
of us finds money proportional to utility 

t In this completely independent development he was to some extent anticipated 
by Emil Borel Consult [F9], [FIO], and [B21] for details and further references 
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At first sight, standard games look very dull, though it is immediately 
recogmzed that some such games are played A tiny but typical ex- 
ample is the game of “Button, button, who^s got the button‘d*”, “Stone, 
paper, scissors” is almost as fanuliar an example, and others could be 
mentioned But, and this seems remarkable at first, any game, except 
possibly those dependent on physical skill, can be viewed as a standard 
game. The great generality of standard games is demonstrated in de- 
tail in Chapter II of [V4], but informal discussion of a smgle example 
will render the idea intuitively clear. Suppose then that you and I are 
to play a game of poker (of a specified variety) At first sight poker 
does not seem to be a standard game, because it involves several ran- 
dom events, and several decisions on the part of each of us, some to be 
made m the light of others. But, it can be argued, there are only a 
finite number of different situations that can arise in the course of a 
game of poker You could, therefore, in principle write mto a notebook 
exactly which choice you would make in each of the possible situations 
with which you might be faced in playing poker wuth me The number 
of possible ways of compiling such notebooks, or pohcies of play, is 
finite, so, except for limitations of tune and patience, you will be at 
no disadvantage m playing one game with me, if you simply chose 
once and for all that one of the many possible policies of play that seems 
best to you Similarly, from my pomt of view’, the game consists, in 
principle, m choosing one policy of play. Once you have chosen one 
of the policies possible for you, say the rth, and I have chosen one of 
the policies possible for me, say the zth, the amount you will have to 
pay me at the termination of the game is a random variable Since it 
IS agreed that the payments are effectively in utiles for both of us, your 
pa 5 anent to me is effectively the expected value of this random variable, 
which may be called L(r, ^) and which is in prmciple known to both 
of us as a function of r and t The elaborate game of two-person poker 
IS thus exhibited, at some expense to realism, as a standard game 
Regarding the choice of an r by you or an i by me as a primary act, 
both of us are at liberty to use mixed acts Indeed, explicit attention 
apparently was first called to the possibility of usmg mixed acts by 
Borel (see [B21]), in just this context 

Let f and g represent mixed acts assigning probabilities 4)(r) and y(i) 
to the values r and respectively The standard game is now replaced 
by a somewhat different game m which you choose an f , I choose a g, 
and I pay you the amount I/(f , g), where 


( 1 ) 
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3 Minimax play 

Von Neumann adduces an argument, the statement of which will be 
briefly postponed, that, if you have respect for my intelligence, you will 
see to it that the most I can possibly take from you shall be as small 
as possible, that is, you will choose an f' for which 

(1) max L(f; g) = = Df min max L(f , g) 

g f g 

Symmetrically, according to his argument, I should choose a g' such 
that 

(2) mm L(f ; g') — = Bf max min L(f ; g). 

f g f 

o 

Since, making the recommended choice, you are sure that you will 
not pay me more than and I am correspondingly sure that you will 
not pay me less than , it follows that L* < L* This inequality 
would, of course, have obtained even if mixed acts were not permitted 
It IS a remarkable mathematical fact (not to be proved in this book) 
that, permitting mixed acts, equality alwa 3 ^s obtains, so the special 
symbol is supeifluous here 

The argument for the recommended choices rests on the equality of 
L* and L* You realize that I can take at least L* from you and that, 
if you are not careful, I may take more On the other hand, I realize 
that you can prevent my taking more than L* from you and that, if 
I am not careful, I may get less This suggests to many that a pair of 
intelligent players, each lespecting the intelligence of the other, will 
each adopt one of the recommended acts 

4 Parallelism and contrast with the minimax theories 

Some formal parallelism between the minimax theories of decision 
and the theory of zero-sum two-person games is evident, but the paral- 
lelism IS much more complete than may appear at first sight The mix- 
tures g are without counterpart in the two minimax theories of deci- 
sion, and the appearance of g m (3 1) at the place where z appears in 
(9 5 1) may seem to mar the parallelism between these two equations 
But, letting 

(1) L(f, = Df E Hr, i)<l>{r), 

r 

in the game theory (in close parallelism with the decision theories), 

(2) Lit, g) = E Hi, 0T(i) < max L(f , i), 
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and 



(3) 

max L(f ; g) = max L(f , i). 



g ^ 


Therefore (3 1) is equivalent to 

(4) max %) = min maxL(f, i) = L* 

1 ft 

Thus from the pomt of view of the minimax theories of decision the 
g’s represent no material innovation and are at worst useless baggage 
Actually, though of little if any relevance m the mterpretation of the 
minimax theories, the g’s constitute a useful mathematical device 
Their usefulness has in fact been illustrated in working out the second 
example in § 9 6 and will be systematically demonstrated in the next 
cha!pter, along vnth the usefulness of the apparently irrelevant ^^maxi- 
min'’ problem posed by (3 2) and of the fact that = L* 

Some remarks on the possibility of interpreting the g’s in the mimmax 
theories are postponed to the end of this section 

In the game theory, L may be any function whatsoever of its argu- 
ments r and t, but, in the decision theories, L is subject to the condition 
that, for every 

(5) min L(r, z) = 0, 

r 

wheie L(r, i) is of couise to be interpreted as L(fr, i) Here is the only 
mathematical difference between the game theory and the decision 
theories, the former being mathematically slightly more general than 
the latter 

Though the mathematical differences are negligible, the intellectual 
difference between the situations leading to the game theory on the 
one hand and to the decision theories on the other is great Serious 
misunderstandings of the (objectivistic) minimax theory have often re- 
sulted from identifying it with the game theory Among other thmgs, 
loss IS then confounded with negative income, and the misconception 
that the (objectivistic) minimax rule is ultrapessimistic is created I 
have even heard it stated on this account that the minimax rule amounts 
to the assumption that nature is malevolently opposed to the interests 
of the deciding person 

Though mathematical convenience seems to be the basic reason for 
introducing the g’s in the minimax theories, it is temptmg to ask whether 
the g^s have also some natural interpretation m those theories At the 
moment, I do not see a convmcmg interpretation m either theory, but 
completeness demands an account of an mterpretation suggested by 
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Wald for his version of the objectivistic theory, especially since this 
interpretation influenced some of Wald’s most widely used terminology 
The objectivistic problem of deciding on an act m ignorance of which 
partition element obtains, the P(B^) being regarded as meaningless, 
suggests a new problem that may perhaps also be called objectivistic 
The new problem arises on postulating that P(B^) is meaningful but 
utterly unknown, that is, P(B^) == Y(^), where the yiiYs are the com- 
ponents of a g here interpreted as the a prion distribution unknown to 
the deciding person 

Since for Wald “loss” was synonymous with “negative expected in- 
come,” he naturally calculated the loss of the new problem thus* 

(6) L(f, g) = -^(f 1 g) 

= E -m 1 b,)p(b.) 

% 

% 

arriving thus at the very function suggested by the game theory In 
Wald’s version of the theory, the new problem therefore amounts to 
the formal introduction of the g’s m connection with the old one, which 
neatly fulfills the reasonable expectation that there should be no ma- 
terial difference between regarding P{BY} as meaningless and regarding 
it as meaningful but utterly unknown 

The suggested interpretation of a g as an unknown — or, to mirror 
Wald more faithfully, fictitious — a priori distribution does not work, 
however, if the loss function of the new problem is defined by (9 4 1), 
for the new function £(f , g) is not then generally the same as the func- 
tion L(f, g) suggested by the game theory, thus 

(7) g) = max E{t' - f | g) 

= max X E{t' — f I 
= max Yj *) “ 

= Hf, g) - min L(f', g) 

f' 

< L(f, g), 

equality holdmg for a typical g (i e , a g such that yfy) > 0 for every t) 
only m the altogether trivial situation that F is dominated by one of 
Its elements 
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Does this mean that, contrary to expectation, there is a material dif- 
ference between the new problem with loss L and the old one'^ The fol- 
lowing exercises show that it does not 

Exercises 

1. max r(f; g) = max L(f; t), 

g I 

2. min max L(f, g) = L*. 

f g 

3 max Z(f; g) = L* if and only if max L(f; t) == L*. 

g ^ 



CHAPTER 12 


The Mathematics 

r 

of Minimax Problems 

1 Introduction 

Since the two different mmunax decision theories and the theoiy of 
zero-sum two-person games have a common mathematical core, it will 
be worth while to digress for a chapter even at the expense of some 
repetition, to discuss this common core mathematically, that is, vir- 
tually without reference to its various possible interpretations The 
discussion will have to be drastically confined relative to the large body 
of relevant literatuie, but the reader who wishes to pursue the subject 
much further will find [B18], [V4], [W3], and [M3] to be key references 

2 Abstract games 

To begin with a very general situation, which will later be specialized 
to the one of main interest, let f and g denote generic elements of any 
two abstract sets, and let I/(f , g) be the value of an essentially arbitrary 
real-valued function It will, however, be assumed for simplicity that 
for every and g' the quantities 

max L(f ; g), mm L(f , g') 

g f 

( 1 ) 

L* = Df niin max L(f; g), Li = Df niax mm L(f , g) 

f g g f 

exist To say that a maximum, for example, exists is not only to say 
that the function in question is bounded from above, but also that the 
maximum value is actually attained for at least one value of the argu- 
ment For want of a more neutral term, call the function L(f, g) an 
abstract game. 

An f IS called minimax, if and only if 

(2) max L(f, g) = Z/*, 

g 

and a g' is called maximin, if and only if 

(3) min L(f , g') = L* 
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The existence of minimax and maximin values of the variables is im- 
plicit m (1). It is an easy exercise to show that f is minimax, if and 
only if 

(4) L(f, g) < 
for every g. 

The corresponding characterization of maximin g"s as those such 
that * 

( 5 ) L(f,g')>U 

for every f could similarly be shown But the symmetry of the situa- 
tion is such that it would be superfluous to denve this charactenzation 
of a maximin exphcitly. Indeed, every theorem, or general conclu^^on, 
abeut L(f , g) obviously has a dual, which arises on applying the theo- 
rem to the new abstract game L(g, f) with L(g; f) == —L(i, g). This 
is typical of what is known in mathematics as a duality principle. Hence- 
forth the duals of demonstrated conclusions, even when not explicitly 
stated, will be as freely used as the demonstrated conclusions them- 
selves, Some conclusions are of course self dual. Incidentally, another 
example of a duality principle was used in § 5 4, and a very important 
one was pointed out in connection with Boolean algebra in § 2.4 
An argument showing that < L* was given m connection with 
the theory of games. More formally, if f and g' are, respectively, mini- 
max and maximin, then from (4) and (5) 

(6) L* > L(f , gO > L*. 

It IS possible, indeed typical, that L* < L* Suppose, for example, 
that f and g are variables that take only two values and that L(f; g) 
IS described by Table 1. Here, as the reader should verify, both f^s 

Table 1 L(f, g) 



g 

1 2 

1 

0 1 

f 


2 

1 0 


and both g^s are minimax and maximm, respectively, and L* = 1, 

Liii = 0 . 

The following theorem is frequently applicable to the identification 
of minimax and maximin values of f and g, and of L* and L^i: 
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Theorem 1 If f', g', and the number C are such that L(f', g) < C 
< L(f, g') for every f and g, then — C = L(f, g'), f' is mini- 

max, and g' IS maximm 

Proof First, C > because 

(7) C > max L(f , g) > mm max Z/(f , g) — 

. g f g 

and, dually, C < But L:*: < L*, so C < < C, that is, 

^ — C. Now (4) and (5) apply ♦ 

Corollary 1 If f and g' are such that L(f', g) < L(f , g') for every 
f and g, then f' and g' are, respectively, mmimax and maximin, and L* 
= L* =L(f,g'). 

3 Bilinear games 

If one stumbles somehow onto a pair f', g' satisf3dng the hypothesis 
of Corollary 2 1, then he has discovered a mmimax, a maximin, and 
the values (in this case equal to each other) of L''* and L* But that 
possibihty of discovery does not exist unless L* = L;}:, which at the 
level of generality of the last section is unusual Almost all real inter- 
est, however, centers on a very special class of abstract games, here to 
be called bilinear games, for which it is demonstrable that L* is in- 
variably equal to 

The definition of bilinear games involves several steps First, con- 
sider an abstract game, L(r, ^), based on a pair of variables, r and ^ 
The two variables are here assumed for simplicity to have only a finite 
number of possible values, an assumption that can, and for statistics 
must, be considerably relaxed Next, let f and g be non-negative func- 
tions of r and z, respectively, arbitrary except for the constraint that 

(1) E/W = EffW = i, 

T % 

in short, probability measures on the r’s and z’s, respectively Finally, 
the bilinear game I/(f ; g) is defined thus. 

(2) L{i,g)=^i^L{T-,z)S{r)g{i). 

r, % 

It is important to recognize that the duality principle continues to 
hold, that is, if L(f, g) is a bilinear game, then L(g, f) = “-L(f, g) is 
also one. 
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In terms of the auxihary functions 

r 

L(r, g) = Df 2^ L(r; t)g{%), 

% 

the following equalities and inequalities can easily be verified by tlie 
reader 

maxL(f, g) = maxL(f, ^), 

g 4 

(4) 

mm L(f ; g) = mm L(r, g). 

f r 

(5y mm max L(r, > min max L(f; ^) = L* > L* 

r I ft 

= max mm L(r; g) > max mm L(r,%) 

g r i r 

But more can be said m connection with (5), for it has been shown by 
von Neumann [V3] that for the special class of functions now under 
discussion L* is actually equal to L*. This important equality cannot 
conveniently be proved here, but the interested reader can refer to the 
relatively simple proof given by von Neumann and Morgenstern in 
Section 17 6 of [V4] (readmg first, if necessary, the introduction to the 
mathematics of convex sets that constitutes Chapter 16 of that book) 
or to the version of it presented m [B18] 

In the light of the equahty of L* and L*, (5) becomes 

(6) mm max L(r; i) > mm max L(f , i) = L* 

r ^ it 

= max min L(r, g) > max mm L(r, %) 

g r t r 

In view of (4) and (6), Theorem 2 1 can be much improved upon for 
bilinear games: 

Theorem 1 For bilinear games, the followmg three conditions on 
f', g', and C are equivalent: 

1 f' mmimax, g' maximm, and ~ C 

2 L(f, g) < C < L(f, gO for every f and g. 

3 L(f', i) < C < L(r, g') for every z and r. 

Proof. Condition 2 implies 1, by Theorem 2 1,1 implies 3 by (6); 
and 3 implies 2 by (4). ♦ 



188 


THE MATHEMATICS OF MINIMAX PROBLEMS 


[12 3 

Corollary 1 A necessary and sufficient condition that f be mini- 
max is that, for some g, I/(f, t) < L{r, g) for every r and i Under 
that condition L* = L(f , g), and g is maximin 

Corollary 1 seems an especially appropriate expression of Theorem 1 
in connection with the minimax decision theories, where the g’s are, after 
all, not really of mterest in themselves Theorem 1, and equivalently 
Corollary 1, are of great practical value To be sure, there are algo- 
rithms, or rules (given by Shapley and Snow in [S12]), by which L* 
and all minimax values of f can in principle be computed, but these al- 
gorithms are so awkward to apply that in practice one generally guesses 
one or more minimax f’s, and also a maximm g, on the basis of some 
clues, verifying the guess and evaluating L* by Corollary 1 To finish 
the'^job, one then finds, if one can, an argument to show that the m;iini- 
max f’s thus discovered are all there are This rather imperfect pro- 
cedure IS especially important, since it can relatively easily be extended 
to many situations in which r and t are not confined to finite ranges, as 
does not seem to be true of the algorithms 

As was mentioned m § 10 3 and as the examples that have been given 
illustrate, if f is minimax, then L(f , is in practice often actually equal 
to L* for all, or at least many, values of ^ Insight mto that phenome- 
non IS given by the following theoiem. 

Theorem 2 If ^ is such that there exists a maximin g for which 
> 0, then L(f, ^) == L* for every mmimax f 

Proof L(f; z) < L*, because f is minimax Therefore L(f , g), be- 
ing a weighted average of the L(f; z)% is at most L*, and it is actually 
less, if any term with positive weight is not equal to L*, But L(f, g) 
> L*j because g is maximin ♦ 

It can happen, and in statistical practice it often does happen, that 
every i satisfies the hypothesis of Theorem 2, in which case L(f, z) = 
L* for every z and every mmimax f 

Theorem 2 often provides a basis for guessing a mmimax f, a maximm 
g, and the value of L*, which can then be checked by application of 
Corollary 1 To take a simple example, suppose that there are n values 
of r, and n oi z There may be some reason to conjecture that each z 
is used by some maximin g, that is, that each z satisfies the hypothesis 
of Theorem 2 If the conjecture is in fact true, then /(r) and L* satisfy 
the system of equations 

E l/(r) + OL* = 1 


( 7 ) 


£ L(r; *)/(r) - IL* = 0. 
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Typically, (7) as a system of n + 1 linear equations m n + 1 variables 
will have exactly one solution (/(r), L*) This solution, if the conjec- 
ture IS vahd, will actually consist of the components of a minimax f 
(m this case the only one) and the value of But the conjecture is 
not yet confirmed. In particular, if any f(r) in the solution of (7) is 
negative, it is contradicted, if not, the investigation can proceed. The 
candidates for maximm values of g are now, by the dual of Theorem 2, 
among the solutions of the system. * 

E IffW + OL* = 1 

% 

- IL* = 0, 

where r is confined to the values for which /(r) >0 To consider only 
the simplest and most typical case, suppose fir) > 0 for every r. Re- 
garding L* as known, (8) consists of n + 1 equations for n vanables, 
which at first sight might be expected generally to have no solution. 
To put the matter differently, if one forgets for the moment that L* 
has been determined by (7), it might seem possible that (8) could lead 
to a different value, say L*'. But, using the latter part of (8) and then 
the first part of (7), it is seen that 

(9) Z L(r, ^)Kr)g(i) = = L^', 

r, I r 

and dually the double sum equals L*, so discrepancy between L* and 
L*' is not among the real snags in the tentative program — irrespective 
of the number of r^s participating in (8). Fmally, if (8) leads to even 
one set of positive ^(^)'s, it follows from Corollary 1 that the f and L* 
derived from (7) are the unique minimax and the true value of L*, re- 
spectively 

The converse of Theorem 2 has been proved by Bohnenblust, Karlin, 
and Shapley in [B19], though their proof cannot be reproduced here 
As IS pointed out by these authors, the converse does not extend at all 
readily to situations involving infinite ranges of r and i Theorem 2 
and its converse can be summarized thus 

Theorem 3 There exists a maximm g for which gii) > 0, if and 
only if L(f; z) = L* for every minimax f. 

4 An example of a bilinear game 

It IS now convement to discuss a certam example, or rather a class of 
examples, of bilinear games, namely those in which ^ takes only two 
values, say 1 and 2 Two preliminary remarks will help to orient the 
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discussion. First, bilinear games in which i takes only one value are 
devoid of interest, for the mimmax problem m that case is simply a 
problem of findmg an ordinary minimum Second, the discussion of bi- 
linear games in which i takes only two values includes, in effect, be- 
cause of the duality principle, the discussion of those m which r takes 
only two values 

If i takes only the two values 1 and 2, the values g = {^(1), ^(2)} 
can be represented graphically by points on an interval, as illustrated 
at the foot of Figure 1 For every r, L(r, g) is linear as a function of 



g, as is L(f , g) for every f It is, of course, ]ust because the L(f , g) of a 
bilmear game is linear in this sense and its dual that I use the term ^^bi- 
linear ” In Figure 1 the five slanting solid lines represent the five linear 
functions L(r, g) of a bilmear game m which r (for illustration) takes 
five values and % takes two The dashed lines represent two values of f , 
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each of which has for simphcity been so chosen as to use, or mix, only 
two values of r 

As may be verified by inspection, the particular bilinear game rep- 
resented by Figure 1 has the special property that min L(r; i) = 0 for 
each z, which is the distmguishing propeity of those bilinear games that 
arise m connection with the mmimax decision theories described in 
Chapters 9 and 10 

Figure 1 bears a more than accidental resemblance to Figure 7 2.1. 
In particular, the concave function 

(1) mm L(r; g) 

r 

marked by heavy line segments m Figure 1 is closely analogous to^the 
convex function so marked in Figure 7 2 1 The particular g empha- 
sized by Figure 1 is that for T7hich the function (1) attains its maximum 
value, which according to (3 6) is L*. This g is therefore the unique 
maximin It has been shown quite generally in [B19] that bilinear games 
with more than one minimax or maxnnin are, m a sense, unusual, 
Figure 1 makes it graphically clear that the special bilmear games now 
under consideration do usually have a unique maximm, because there 
is more than one maximin only m case (1) happens to have a horizontal 
segment 

What are the minimax f^s for the bilmear game represented by Figure 
1? According to the dual of Theorem 3 2, an r cannot be used in the 
formation of a minimax f unless L(r, g) = L* for the (in this case 
unique) maximin g That consideration eliminates all but two of the 
r^s from consideration, and it is graphically clear that this will usually 
be the case for bilinear games in which i takes only two values Theo- 
rem 3 2 itself, applied to the particulai game under discussion, shows 
that the graph of L(f , g) as a function of g must be horizontal for any 
minimax f The two preceding conditions together eliminate al] values 
of f except the one correspondmg to the horizontal dashed line in Fig- 
ure 1, and that f is indeed mmimax, because L(f; i) = L* for both 
values of i 

To specialize still further, suppose that r as well as i takes only two 
values Such a game can, of course, be represented graphically m the 
spirit of Figure 1 Several qualitatively different situations can occur, 
which might, for example, be classified by the relation of the two Imear 
functions L(r, g) to each other The reader should graph and consider 
many or all of these possibilities for himself The only one treated 
here will be that m which the two functions cross each other at an in- 
terior g, with one function sloping up and the other down It is graphi- 
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cally clear that there will then be a unique mimmax and a unique maxi- 
mm, as will now be shown analytically 

The condition postulated can be expressed without loss of generality 
thus 

L(1,2)>L(1,1), L(2;1)>L(2,2), 

L(2, 1) > L(l, 1), L(l,2) >L(2,2) 

r 

Or, more mnemonically, 

(3) L(1;2),L(2;1) >L(1,1),L(2,2). 

It IS conjectured, in this case on graphical grounds, that the program 
outlmed m connection with (3.7-8) applies, and the reader can indeed 
verify that that program leads to the conclusion 

(4) L* = \L{1- 2)L(2, 1) - L(l; 1)L(2; 2)}/A, 
where 

(5) A = L(l, 2) + L(2, 1) - L(l; 1) - L(2; 2); 
and that the unique minimax f and maximm g are 

J/(l) = [L(2,l) -L(2,2)]/A 
l/(2) = [L(l;2)-L(l,l)]/A, 

Ul) = [L(l,2) -L(2,2)]/A 
\g(2) = [L(2, 1) - L(l, 1)]/A 

If the game arises from an application of the minimax decision theory, 
(3) almost always applies More precisely, in this case, except possibly 
for the order of numbering, 

(8) L(l, 1) = L(2, 2) = 0 and L(l, 2), L(2, 1) > 0, 

so, if only the inequalities in (8) are both strict, (3) applies Then 
(4-7) specialize to 

(9) L=<= = L(1,2)L(2,1)/A, 
where 

( 10 ) 

( 11 ) 

( 12 ) 


A = L(1,2) + L(2,1); 

/(I) = L(2, 1)/A, /(2) =L(1,2)/A, 

ff(l) =L(1,2)/A, ^(2) = L(2, 1)/A 
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6 Bilinear games exhibiting symmetry 

Mathematically the solution of a bilinear game is often simplified by 
considerations of symmetry For statistical applications, the implica- 
tions of symmetry for bilmear games are of fundamental importance 
in so far as they represent a counterpart in the minimax theory of the 
disreputable but irrepressible prmciple of insufficient reason. This sec- 
tion discusses these imphcations m an elementary, but formal, way 
It can be skimmed over or skipped outright without much detriment 
to the understanding of later sections 

Any discussion of symmetry involves, at least implicitly, the branch 
of mathematics known as the theory of groups Though what is to 
be said here about games exhibiting symmetry is intended to be clear 
without prior knowledge of the theory of groups, it may be mentioned 
that introductions to that subject are to be foimd in many places, for 
example in [B14] 

It can, and m practice often does, happen that a bilinear game has 
some symmetry f This means that there are permutations, here sym- 
bolized by r, T', etc , of the values of r among themselves and the values 
of i among themselves such that 

( 1 ) L{Tr,Ti) 

for eveiy r and z, where, of course, Tr and Ti are the values into which 
T carries r and % respectively Permutations satisfying (1) are said to 
leave the game invariant^ or to belong to the grouy (of symmetries) of the 
game The permutation U that leaves every r and every i fixed must 
be counted among the permutations in the group of the game, but the 
game has no symmetry (worthy of the name) unless there are other 
permutations besides U in its group 

An example of a game with high symmetry is the game implicit in 
the second example of § 9 6, for, to any permutation whatsoever of the 
SIX ^'s m that game among themselves, there is a correspondmg permu- 
tation of the r’s such that the two permutations taken together leave 
the game invariant It was, of course, the exploitation of symmetry 
that made the treatment of that example relatively simple 

Retummg to bilmear games in general, if T and T' are m the group 
of the game, then the product TT defined by the condition that 

(2) {TTy - Df T(TV), {TTy = Df T{rz) 

is obviously also a permutation in the group of the game This multi- 

t This concept must not be confused with that of “symmetrical games,” which are 
symmetncal in the sense that the equation L(r, i) - — r) is meaningful and true 
for every r and i 
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plication of permutations somewhat resembles the ordinary multipli- 
cation of numbers In particular, (TT')T'^ is evidently the same as 
T{T'T"), though it IS not necessarily true that TT' — T'T, 

Relative to this multiplication the permutation f7 plays the role of 
the unit, or number 1, in arithmetic, for it is obvious that TU — UT 
== r for any permutation T 

For every permutation T, there is evidently a permutation T~^j and 
one only, that undoes T, that is, one such that T~“'^T = U, It is easy 
to see also that = U and that, if T is m the group of the game, 

is too The notation T~^ is of course motivated by the considera- 
tion that, relative to the multiplication of permutations, plays the 
role of the reciprocal of T 

IJ^ will be adopted as a definition that Tf and Tg are the functions 
such that T/(r) = fiT^^r) and Tg{i) == giT^h) for every permutateon 
of T and for every r and ^ The intervention of T~~^ in this definition 
may at first seem arbitrary, but it is motivated by the following con- 
siderations First, if f IS, for example, the function such that /(ro) = 1 
and fir) = 0 for r 7 ^ tq, then Tt should be such that Tf{Tro) = 1 and 
Tf(r) = 0 for r 7^ Tvq Second, S(Tf) should be (jSr)f rather than 
(TS)t The definition having been adopted, L(rf, Tg) can be calcu- 
lated thus* 

(3) LiTf, Tg) = E L(r; z)f(T-h)g(T-\) 

r, i 

= J^L(Tr, Ti)f(T-^Tr)giT-^Ti) 

Ti % 

= j:LiTr,Ti)f{r)gii), 

r, i 

where the basic fact is exploited that, if r, ^ runs once through all pairs 
of values, then Tr, Ti also does so It follows from (1) and (3) that, if 
T IS in the group of the game, then 

(4) L(Tf, Tg) = L(f , g) 

Am f (g) IS called invariant under the group of the game, if and only if 
Tf — f (Tg = g) for every T m the group There is a natural way to 
construct from any f an f invariant under the group, and dually for g 
Namely, let 

f 

n T 

g =Dfl E Tg, 

n T 


( 5 ) 
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where (here and throughout this section) n is the number of elements 
m the group and the summation is over all elements of the group The 
definition (5) accomplishes its objective, because 

(6) =lEZ/(r-V) 

T T T 



n X ^ 


and 

(7) T'fir) = fiT'-h) 

= - 

n T ' 

* 

= -ZT'Tf(r) =f(r) 
n T 

for every r and for every T' in the group In (7) use is made of the 
easily established facts that and that as T runs 

once through the group so does T'T, The justification of g is, of course, 
dual to that of f It is noteworthy that f = f , if and only if f is invariant 
under the group of the game 

Suppose R (/) is a set of the r^s (^^s). Then, by definition, r e TR 
s TI)j if and only if e R s 7) ; and the set R (I) is invariant 
under the group of the game, if and only if TR = R {TI = 7) for every 
T in the group. 

Exercises 

la. If R is invariant, so is 

lb If 72 and 72' are mvariant, so are i2 fl 72' and 72 U 72' 

Ic The vacuous set and the set of all r’s aie mvariant 
2. For every 72, let S = nf where T is of comse confined to 

the group, and, for every r, define the trajectory of r as [r], where [r] is, 
as is customary, the set whose only element is r. 

(a) R is the smallest mvariant set contaming 72. 

(b) 72 IS the intersection of all mvariant sets containing 72 

(c) ^ = U w. 

r £ jB 

(d) [r] is the smallest invariant ^t of which r is a^element 
3 a If 72 IS invariant, and 72 fl [r] 7 ^ 0, theE^72 Z) [r] 

3b If ^is mvariant, and r_j 72, ^en 72 3 [r] 

3 c. If [r] D [r'] 0 , then [r] == [r'] 
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4a The following conditions are equivalent 
a. R IS invariant 
^ R = R _ 

7 For every r [r] C R. 

d. R IS partitioned into sets each of which is a trajectory 
4b The following conditions are equivalent* 
a f IS invariant 

/5. The set of r’s for which f takes any given value is invariant. 

7 f IS constant on every trajectory 
5a If TW == r, then {TT'T-^)Tr = Tr 

5b If {r} denotes the number of elements of the group that leave r 
fixed, then {r} = {Tr}, _ 

5c If II r II denotes the number of elements m [r], then n = {r}|[ r || 

5d Both {r} and || r || are divisois of n 

5e The value of f everywhere on the trajectory of r is 


( 8 ) 





6 Note the dual of each of the preceding exercises 


In the establishment of all these preliminaries, the theory of bilinear 
games has been almost lost sight of, but it is now possible to say much 
about the significance of invariant functions and sets for bilinear games 
I begin with a theorem valued for some of its corollaries rather than 
for any charm of its own 

Theorem 1 If L(f', Tg) < L(f', Tg) for every T, then L(f'; g) < 
L(f', g) If in addition L(f, g) < L(f', g), then L(f', g) < L(f', g) 

Proof 

(9) g) = L(f', Tg) < Tg). 

Therefore 

(10) L{t', g) = - E L{T-H'-, g) 

n T 


<-T.W,Tg) = Lit" -,-%)■ 

U T 

If L(f', g) < L(f"; g), then (9) is strict for T — U, and therefore (10) 
is also strict ♦ 


Corollary 1 If L{f, Tg) = L(f"; Tg) for every T, then L(f', g) = 
W, g) 
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Corollary 2 If L(f , g) = L(f'; g) for every g, then g) = 
L(f", g) for every g. 

Corollary 3 , g) — I^(f , g) = i(f; g) for every f and g 

Corollary 4 If f is invariant under the group of the game, L(f , g) 
= L(f , g) for every g 

Paraphrasmg some of the nomenclature of § 6 4, if L(f',* g) < g) 
for every g, say that f' dominates f", if f' dommates f", but f" does not 
dominate f', say that f' strictly dominates f"; if f' dommates f', and f" 
dominates f', say that f' and are equivalent, if f' is not stnctly domi- 
nated by any f, say that f' is admissible. 

Corollary 5 If f dommates, strictly dominates, or is equivalent 
to f', then f dominates, strictly dommates, or is equivalent to f'', re- 
spectively. 

Corollary 6 If L(f; Tg) < L(^; Tg) for every T, then L(f, g) = 

Hf; g) 

Corollary 7 If L(f ; i) < L{f, i) for every ^ s J, where I is mvari- 
ant under the group of the game, then L{f; i) = L(f , i) for i e 7 

Corollary 8 It is impossible that f stnctly dominates f 

Theorem 2 max L(f , g) < max L(f ; g), equality holding, if and only 

g g 

if the nght-hand maximum is attained for a g invariant under the group 
of the game. 

Proof. 

(11) max L(f ; g) = max L(f , g) 

g g 

< max Z/(f , g). 

g 

The inequality in (11) follows from the fact that every g is a g, equality 
holds, if and only if the final maximum is attamed for some g, that is, 
for some mvariant g ♦ 

Corollary 9 If f is minimax, so is f 

Corollary 10 There exists a minimax f mvariant under the group 
of the game 

If a game has more than one minimax f, it is temptmg to suppose 
that in statistical, if not in all, applications of the theory an invanant, 
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or symmetrical, minimax f would recommend itself at least as highly 
as any other minimax f. This supposition, being vague, cannot be 
really proved, but certain facts tend to support it In particular, the 
following theoiem is a reassuring improvement of Corollary 10 

Theorem 3 There is at least one admissible, invaiiant, mmimax f 

Proof It is a direct consequence of a theorem (Theorem 2.22, p. 54, 
of [W3]) of Wald’s, too technical for statement or proof here, that at 
least one mvariant mmimax f is strictly dominated by no invariant f 
If that f were strictly dommated by any f" (invariant or not), it would 
also, according to Corollary 5, be dominated by f which is impossible. 
Therefore f is admissible. ♦ 

c 

If the bilmear game has high symmetry or, more explicitly, if 1}he 
number of trajectories mto which the r’s or the 2 ’s, or both, are parti- 
tioned is small; the search for invariant mmimax f’s and invariant 
maximin g’s is relatively simple An mvariant minimax is character- 
ized as an invariant f' such that 

(12) max L(f; g) = min max L(f; g) = Lt 

g f g 

But, since at least one invariant minimax exists, the criterion (12) is 
not changed if the minimization on its right side is confined to invari- 
ant f’s; with f so confined, the ciiterion remains unchanged, if both 
maximizations are confined to invariant g’s (as Corollary 3 shows). 
Thus the search for mvariant mmimax f’s and invariant maximin g’s 
amounts to the solution of an abstract game that arises from the origi- 
nal bilinear game by ruling out certain values of f and g, namely the 
un-mvariant ones 

This new and smaller abstract game can be exhibited as a bilinear 
game thus: Let it be understood for the moment that r' ranges over 
such a set of the r’s that there is exactly one r' m every trajectory [r]; 
dually for For invariant f and g, 

r t 

r' %' rz [rT i 6 [i'] 

= E Ef(r')g(i') 2^ E_L(r, ^) 

r' t' re[rT 

= E E i')fV)gm 


( 13 ) 
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where 

(14) 

and 


L'(r', O 



L{r, ^), 

r e {t'\ % s 


(15) f(r') =Df li r' \\m,g'{^') =Df |i \W) 


Finally, it is easily verified that, except foi the conditions f{i') > 0, 
> 0, and == 1, the coefficients /'(/) and aTe 

arbitrary The new game is therefore to all intents and purposes a bi- 
Imear game with only as many r'^s and as there are r-trajectories 
and ^-trajectories, respectively, in the oiigmal game The new game, 
mcidentally, may well have symmetry of its own 

If there is only one r- or one ^-traJectory, the new game is so simple it 
scarcely deserves to be called a game This occurs, for example, in the 
second example of § 9 6, wffiere theie is only one 2 -tra 3 ectory. In that 
situation there is only one invariant g, and it is equal at every ^ to the 
reciprocal of the total number of (w^hich is here the value of || ^ || 
for every ^) That g must therefore be an admissible maximin The 
value of L* is therefore given by 


(16) L* = mm y , — ^ X) ^) 

r IhIU 

The invariant minimax f^s are those and only those mvariant f’s such 
that /(r) = 0 for eveiy r that fails to minimize the sum in (16). More- 
over, here the minimax f’s (invariant or not) are all equivalent, as can 
be argued thus Any invariant minimax f is such that 

(17) L(f,g) = L(f,g) =L* 

for every g If any minimax f whatsoever failed to satisfy (17), it 
would strictly dommate f , but according to Corollary 8 that is impos- 
sible Therefore in the very special situation at hand all minimax f’s 
satisfy (17) and are accordingly equivalent 

It IS, of course, important to extend consideration of symmetry to 
bilinear games with mfinite sets of r’s and ^’s, and infinite groups of 
symmetries, but the task has not yet proved straightforward Two key 
references bearing on it are [L4] and [B17] 
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" Objections to 

the Minimax Rules 

1 Introduction 

I have alieady expressed and supported my opinion that neither the 
objectivistic nor the personahstic minimax rule can be categorically ‘de- 
fended (§9 7 and § 10 3) On the othei hand, certain objections have 
been leveled agamst the objectivistic rule (that being the well-known 
one) that seem to me to call for remteipretation, if not outright refu- 
tation 

2 A confusion between loss and negative income 

Some objections valid against the minimax rule based on negative 
income are irielevant to that based on loss The notions that the mini- 
max rule IS ultrapessimistic and that it can lead to the ignoimg of even 
extensive evidence have already been discussed as examples of such ob- 
jections. 

Another example I would put in the same category has been suggested 
by Hodges and Lehmann [H5] In this example a person who has ob- 
served n independent tosses of a com for which the probability of heads 
has an unknown value p is required to predict the outcome of the 
(n -f l)th toss. Hodges and Lelimann here interpret prediction in the 
following somewhat sophisticated, but reasonable, sense The person 
is, in the light of his observation, required to choose a number p be- 
tween 0 and 1 and to pay a fine of (1 — p)^ or p^ according as the 
(n -h l)th toss is in fact heads or tails Thus the (expected) income 
attached to the primary act p and event p is 

( 1 ) i{p, p) = -pO- - pf - (1 - p)p^ 

= -{p-pf - p{i ~ p). 

As Hodges and Lehmann show, the only derived act (mixed or pure) 
that yields the minimax of the negative mcome is to set p = f irrespec- 
tive of the observation But it is, in common sense, absurd thus to ig- 

200 



UTILITY AND THE MINIMAX RULE 


201 


13 3] 

nore tiie observation of the first n tosses In view of this absurdity, 
almost everyone would agree that applymg the minimax rule directly 
to the negative of (1) is a foolish act for the person to employ 

The absurdity of minimizing the maximum of negative income in 
this example is of course no valid argument against minimizing the 
maximum loss It is easy to see that the loss correspondmg to (1) is 

(2) L(p, p) = (p - p)\ 

As Hodges and Lehmann happen to show m the same paper [H5] 
(though in a different context), and as will be discussed in some detail 
in § 4, the unique minimax derived act does use the observations to 
advantage, resulting in a loss of 

4(1 + 

irrespective of p The absurd act of setting p = J irrespective of the 
observation results in the loss (p — -1)^, which in any ordinary context 
would be inferior to (3), especially for large n 

Incidentally, the minimax derived from (2), though not nearly so 
bad as setting p identically equal to f, is itself open to a serious objec- 
tion, which will be explained m § 4 

3 Utility and the minimax rule 

Some objections to the objectivistic, and mutatis mutandis to the 
group, minimax rule are m effect objections to the concept of utility, 
which underlies the minimax rules Criticisms of the concept of utility 
have already been discussed in Chapter 5, particularly in § 5 6, but 
certain aspects of the discussion need to be continued here. 

It IS often said, and I think with justice, that, even grantmg the 
validity of the utility concept in principle, a person can seldom write 
down his income function 7(r, z) with much accuracy This idea is 
put forward sometimes with one interpretation and sometimes with 
another Of these, only the first is strictly an objection to the utility 
concept 

That one is a dilemma raised by the phenomenon of vagueness 
Vagueness may so blur a personas utility judgments that he cannot ac- 
curately write down his mcome function. I suppose that no one wall 
seriously deny this, I would be particularly embariassed to do so, for 
it IS almost a recapitulation of the very argument that leads me, though 
in principle a peisonahst, to see some sense m the objectivistic decision 
problem On the other horn, if all meanmg is denied to utility (or some 
extension of that notion) no unification of statistics seems possible. 
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Three special circumstances are known to me under which escape from 
the dilemma is possible First, there are problems in which some 
straightforward commodity, such as money, lives, man hours, hospital 
bed days, or submarines sighted, is obviously so nearly proportional to 
utility as to be substitutable for it Second, there are problems in 
which exact or approximate mmimax decisions can be calculated on 
the basis of only relatively little, and easily available, information about 
tile mcome function, such as symmetry, monotoneity, or smoothness 
The possibility of cheap extensive observation, which (when it occurs) 
makes the mmimax principle attractive, also tends to make many de- 
cision problems faU into both of the two types in which the difficulty 
of vagueness is alleviated. For example, in a monetary decision prob- 
lem^ with cheap observation available, it often happens that the weak 
law of large numbers, and the like, can be invoked to justify regarding 
cash income as proportional to utility income. 

Third, there are many important problems, not necessarily lacking 
in richness of structure, in which there are exactly two consequences, 
typified by overall success or failure in a venture In such a problem, 
as I have heard J. von Neumann stress, the utility can, without loss 
of generality, be set equal to 0 on the less desired and equal to 1 on the 
more desired of the two consequences 

The second sense in which it may, though not quite properly, be 
said to be impossible to write down the income function is typified by 
this example A manufacturer of small shoit-lived objects, say paper 
napkins, is faced with the problem of deciding on a program of sam- 
pling to control the quality of his product He complains that, though 
for this problem his utility is adequately measured by money, he can- 
not write down his income function because he does not know how the 
public will react to various levels of quality — that, in particular, the 
minimax rule does not tell him at all how much he ought to spend on 
the sampling program, though it may say how any given amount can 
best be employed The manufacturer has a real difficulty, though he 
expresses it inaccurately He forgets that the lack of knowledge that 
gives rise to the decision problem involves not only the state of his 
product, but also the state of the public, taking the state of the public 
into account, there is no real difficulty in writing down the mcome func- 
tion But, if it IS not practical for the manufacturer to make observa- 
tions bearing on the state of the public as well as those bearing on the 
state of the product, the mmimax rule is not a practical solution to his 
problem, for, rigorously applied, it would remove him from the paper- 
napkin business I believe that in practice the personalistic method 
often is, and must be, used to deal with the unknown state of the pub- 
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lie, while objectivistic methods, particularly the minimax principle, are 
now increasingly often used to deal with the state of the product — a 
sort of dualism having some parallel m almost all serious applications 
of statistics This is not to deny that relatively objectivistic methods 
of market research can sometimes be used, nor that there are personal- 
istic elements aside from those concermng the state of the public in 
much of even the most advanced quality control practice. 

4 Almost sub-minimax acts 

Another sort of objection to the objectivistic minimax rule is illus- 
trated by the following example attributed to Herman Rubin and pub- 
lished by Hodges and Lehmann [H5] An integer-valued random 
variable x subject to the binomial distribution 

(1) P(a;|p) =Qp*(l-pr- 


is observed by a person who knows n but not p. His decision problem 
is to decide on a function p of rc subject to the loss function 

(2) L(p, V) = P((p - p)2 I p) 


= S ivix.) - p)2 



px(i _ 


In other terms, he must estimate p on the basis of an observation of x 
and subject to a loss equal to the square of his error The traditional 
estimate of p is defined by ^o(^) = This estimate has many vir- 
tues, it is the maximum-likelihood estimate, the only unbiased esti- 
mate, and (as is shown in [Gl]) the only minimax estimate for a some- 
what different problem from that posed by (2) But for (2) the imique 
minimax is (as is shown in [H5]) defined by 


(3) 




(i - Pq{x)) 

1+71^ ■ 


As it is straightforward to verify for every p, 


(4) 

and 

(5) 


i(Po; v) 


p(i - p) 

] 

n 


P) 


1 

4(1 + 


which constant is, therefore, L*. The ratio of the first of these functions 
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to the second is 



the maximuin of which occuis at ?? = 1/2 and is 



Thus, foi large n, the maximum loss of po is larger than by only a 
slight fraction. Moreover, the loss of Po is less than L* except when p 
lies in the interval where 

(8) 4p(l — p) > (1 + 

r 

that is, where 

(9) 1 P - i 1 < ill ~ (1 + (4?^)~^^ 

To take a numerical example, consider n = 10^ (which the practical 
will note IS rather big foi a sample) The advantage of pi over po at 
p = 1/2 IS then only 0 64%, and, once p departs by as much as 0 04 
from 1/2 in either direction, the advantage is with po It amounts, 
for example, to 3 5%, 15 5%, co% m favor of po, when p is 0 6, 0 8, 
1 0, respectively 

Many agree that in such an example good judgment will, under oidi- 
nary circumstances, prefer po to the recommendation of the minimax 
rule. Pi To my mind, this example constitutes a valid objection against 
the mimmax rule, in the sense that it demonstrates once more that, 
whatever value that rule may have, it is at best a rule of thumb 

The example is a good illustration of the role of personal probability 
in ordinary statistical thinking, for the source of the dissatisfaction a 
person would ordinarily feel for pi as opposed to po stems from the fact 
that he would not ordinarily attach enough personal probability to the 
immediate neighborhood of p = 1/2 to justify preference for pi. It 
follows from the numbers given above, for example, that, if the person 
attaches a probability of less than 0 84 to the interval [0 4, 0 6], he will 
prefer po to pi ; the same conclusion can be derived from the supposition 
that the standard deviation of the personal distribution of p is at least 
0 04. Of course, situations can be imagined in which the personal prob- 
abihties would be so concentrated about 1/2 as to justify preference for 
pi , the point of the example is only that there are situations in which 
that would clearly not be the case. 

Interesting material and important references bearing on the phe- 
nomenon illustrated by the decision problem under discussion are given 
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by Wolfowitz in [W17] It seems to be suggested there that the difid- 
culty can be met by postulating some small amount e by which the 
person does not mind havmg his income decreased Taken literally, 
this postulate implies on repeated apphcation that aU incomes are 
eqmvalent for the person, but Wolfowitz makes it clear that he does 
not mean to propose the postulate in a sense that allows repeated ap- 
plications The idea is reminiscent of those theories of probabihty 
that permit the neglect of an occasional improbable event (mentioned 
m the last paragraph of § 4 4) and seems to me open to an objection 
similar to the one raised in connection with them. In particular, the 
choice of the e would be not only personal, but ill defined as well 

6 The minimax nile does not generate a simple ordering 

Finally, an objection made by Chernoff [C7] to the objectivistic mim- 
max theory must be discussed. This will entail statement and illus- 
tration of the phenomenon on which the objection is based, and state- 
ment and analysis of the objection itself 

The phenomenon pertams to the relation between two objectivistic 
decision problems, to be called for the moment the narrow and the 
wide problems The nariow problem is determined by certain primary 
acts fr, and the wide one is determined by those primary acts and one 
more, say fo In other words, the wide problem presents the person 
with one moie choice than the narrow Calhng the two income func- 
tions I(fj %) and /o(f, ^), it IS to be understood, of course, that /(f, ^) 
= 7o(f, i) for any f that does not use, that is, give positive weight to, 
fo The coirespondmg equation does not necessarily obtam for the 
loss functions, indeed it clearly does so, if and only if the maximum of 
/o(f , ^) in f can be attained for each ^ without usmg fo Even in case 
no minimax of the wide game uses fo, it is therefore to be expected that 
the minimax f’s of the wide game will be different from those of the 
narrow game In fact, it can happen that no mmimax of the wide game 
uses either fo or any fr used by a minimax of the narrow game, this is 
the phenomenon to be discussed m this section 

To see how the phenomenon can occur, suppose that Figure 12 4 1 
represents the loss function of the narrow problem, and consider what 
the corresponding figure is for the wide problem, supposing that fo is 
such that 

A 7(fo, 2) - max/(fr; 2) > 0, 


( 1 ) 


S ==Df max 7(fr, 1) — 7(fo; 1) > 0 
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It is clear that A and S can attain any positive values, irrespective of 
the structure of the narrow problem The figure for the wide problem 
is constructed thus* The graph corresponding to each fr is left fixed at 
its right end and raised by the amount A at its left, and fo is represented 
by a Ime sloping up with slope S from the lower left-hand corner It is 
easy to see that the raising of the left ends of the graphs of the f^^s can 
make any fr with a positive slope horizontal If, further, such an fr 
minimizes L(f;g) for some g, it can be made a mimmax by choosing X 
sufficiently large Thus, speaking specifically of Figure 12 4 1, the 
corresponding to the left segment of the heavy concave graph, which is 
not used in the minimax of the narrow problem, can become the unique 
minimax. Figure 12 4.1 is a little special in that the heavy concave 
graph has only one vertex to the left of the maximin of the narrow prob- 
lem If there were more than one, the phenomenon could also be" ex- 
hibited by making the second vertex to the left the unique maximin, 
which would occur for all A^s and S's m a certain range Thus the phe- 
nomenon occurs not only for isolated values of A and S but typically 
for whole domams of values 

Suppose, to take a striking case, that one fr, say fr', is the unique 
minimax for the narrow problem and a different one, fr", is the unique 
minimax for the wide problem It is absurd, as Chernoff says in effect, 
to recommend fr' as the best act among the fr's when only the fr’s are 
available and then to recommend fr" as the best for an even wider 
class of possibilities Fancy saying to the butcher, “Seeing that you 
have geese, Fll take a duck instead of a chicken or a ham ” 

It is absurd, then, to contend that the objectivistic minimax rule 
selects the besl available act But that is not so devastating to the rule 
as might at first appear, for it is not contended by anyone known to 
me that the rule does select the best On the contrary, the rule is in- 
voked only as a sometimes practical rule of thumb in contexts where 
the concept of “best” is impractical — ^impractical for the objectivist, 
where it amounts to the concept of personal probability, in which he 
does not believe at all, and for the personalist, where the difficulty of 
vagueness becomes overwhelming To have a consistent concept of 
“best,” that is, to have a mode of decision that does not exhibit the 
phenomenon, amounts, as Chernoff himself points out, to the establish- 
ment of a simple ordering of preference among acts In so far as that 
can be done consistently with the sure-thing principle, personal proba- 
bility IS practically defined thereby If the sure-thing principle is vio- 
lated, the ordermg is absurd as an expression of preference For ex- 
ample, the rule of minimizing the maximum of the negative of income 
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does not exhibit the phenomenon It amounts to considering f < f , if 
and only if 

(2) max7(f, %) < max7(f; ^). 

% % 

This establishes a simple ordering, but one that violates the sure-thing 
principle by violating P2. 

The phenomenon has a particularly natural interpretation for the 
group minimax rule It would not be strange, for example, if a 
banquet committee about to agree to buy chicken should, on being in- 
formed that goose is also available, finally compromise on duck. 



CHAPTER 1 4 


The Minimax Theory 
Applied to Observations 

1 Introduction 

In this chapter the concept of observation is re-explored from ^ the 
point of view of the mmimax rule In principle, objectivistic and group 
minimax problems should here be treated on an equal footing But, 
since mathematically the two theories are identical, it seems wisest to 
focus on one, interjecting occasional digressions about the other I 
have chosen to focus on the objectivistic problems That choice, being 
in accoi dance with other literature on the minimax rule, will facilitate 
the reader’s further study of the subject, and it also renders more ob- 
vious the intimate connection between the mmimax rules and the theory 
of partition problems presented in Chapter 7 The present chapter 
can indeed be regarded largely as a paraphrase of Chapter 7, so there 
will imavoidably be many references to the notations and conclusions 
of that chapter. 

2 Recapitulation of partition problems 

Paralleling the treatment of observation m Chapters 6 and 7, an 
objectivistic observational problem will be roughly defined to consist of 
an objectivistic problem, regarded as basic, an observation, and a sec- 
ond objectivistic problem, derived from the basic one and the obser- 
vation 

More explicitly, the basic problem may be any objectivistic problem 
It will be characterized by the values of E{f\ where f ranges over 
a set of acts F subject to the conditions laid down m § 9 3, and is a 
partition. 

The observation is a random variable x (confined, as usual m this 
book, to a finite set of values), subject to the conditional distributions 
P(x 1 and so articulated with F that E{f | x) = E{i\ B^) for 
every x such that P{x\B^ >0 The last condition is (7 2 7), as men- 
tioned in connection with that equation, the condition will in particu- 
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lar be met, if every f is constant on every a specialization costing 
hut little in real generality. 

The derived 'problem (parallelmg § 6.2) consists of F(x), the set of all 
functions assigning elements f of the basic acts F to values x of the 
observation x The values of E(f{x) j B^) for f (x) e F(x) are computable 
from the E(f j and the P{x j B^) thus 

(1) Eifix) 1 B,) = E(Eif(x) I X)) 

= Y:Eif{x)\B,,x)P(x\B,) 

T 

= l:E(f{x)\B;)P{T\B,) 

X 

It Will now be shovm that the set of derived acts F(x) satisfies the 
tedinical conditions imposed on the set of basic acts F, so that the 
derived problem is also an objectivistic decision problem In fact, if 
every f s F is expiessible m the form 2/(r)f,. (with the usual condition 
on the /(r)’s), primary acts for F(x) analogous to the f^’s can be defined 
by attaching to every function r = r(x) an element f(x, r) of F(x), 
where 

(2) i{x, r) =Dffr(x) 

There aie only a finite number of f(x, r)’s, and all elements of F(x) are 
expressible as weighted averages of them, the first assertion is obvious, 
and the second poses the problem of finding, for any system of proba- 
bility measures x) on the r’s, at least one probability measure on 
the set of functions r with respect to which P(r(x) = r) = <j){r , x) for 
every r and x The problem typically has many solutions, the simplest 

15 to let the r(x)^s, regarded for each x as functions of r, be independent 
random variables on the set of r’s considered as a probability space, 
that is, to set 

P(r) = n<f>(r(x);x) 

X 

Formally, this particular solution leads to the identity 

(3) f(x) = X) 4>(r; a.)fr 

r 

= L|n^(Ka:0,a:')|fr(x) 

r L X' J 

The identity and the fact that the coefficients in braces are non-nega- 
tive and add up to 1, are easy to check analytically, if it is recognized 
that summation with respect to r means multiple summation with re- 
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spect to r(l), r(2), • • (the x's being for definiteness supposed to take 
integral values) Equation (3) shows incidentally that it is immaterial 
whether it is before or after the observation that mixed acts are intro- 
duced. 

Turn momentarily to the idea of observation in group decision prob- 
lems Here the E{f, J5,)'s are replaced by /(f, ^)’s, the expected income 
of f in the opinion of the ^th person There is no partition B^, except 
In a special, though theoretically important, case, namely that of the 
^th person holding unequivocally that obtains 
The P(x 1 B^ys are here replaced by P(x, ^)'s, the personal distribu- 
tion of X for the ^th person It is postulated that, for each person, the 
conditional expectation of f is unaffected by knowledge of x 
The derived acts are formally the same as for an objectivistic decision 
problem, and the income function of the derived group decision prob- 
lem is 

( 4 ) = 

X 

Returning to objectivistic pioblems, (9 4 1) defines the loss function 
of the basic objectivistic problem and, mutatis mutandis, that of the 
derived problem also, thus. 

(5) L(f(x) , ^) = max E{i'{x) | B,) - E{i{x) | B,) 

f'(x) 

The right side of (5) admits some simplification, for, if the person knew 
which B^ obtained, observation would be valueless to him. Accord- 
ingly, 

(6) L{t{x) , z) = max E{t' \ B,) - £?(f(x) | B^). 

f' 

Analytically, the simplification is justified thus 

(7) max E{f 1 B,) < max £(f(x) 1 B,) 

f f(x) 

= max E E(f(x) | B,)Pix \ B,) 

f (^) X 

< max 1 jBJ. 

f 

In discussing application of the mmimax rule to the basic and de- 
rived loss functions, it is doubly advantageous to introduce mixtures 
of the for thereby the theory of bilinear games presented in Chapter 
12 and that of partition problems (with some reinterpretation) can 
both be brought to bear Letting denote a generic system of weights 
i3(^), > 0 and S/5(j) = 1, and using the notation of Chapter 7, the 
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bilinear games associated with the primary and denved problems are, 
respectively, 

( 8 ) = m - 

( 9 ) mx),p) = m-E(f(x)\^) 

= m -'EY, E(.m 1 B,)Pix I 

} X 

= m - E mix) \ 13, x)Pix 1 /?). 

X 

If necessary, (9) can be mterpreted and verified by comparison mth 
(7 3 7) and (7 2 8), in that order 

In Chapter 7, /3(^) was generally required not only to be non-negaljve, 
but also strictly positive, on examination, this slight difference from 
the present context will be found innocuous Again, in Chapter 7, the 
statement and derivation of conclusions were, for simplicity, nominally 
confined to twofold partition problems Here the extension of those 
conclusions to ?^-fold problems will be freely used, though some readers 
may prefer here, as there, to focus on twofold problems. 

Letting L* denote the mmimax (and maximin) value of the basic, 
and L*(x) that of the derived problem, it is obvious, since F(x) 3 F, 
that L*(x) < L*, but there is some interest in viewing this inequality 
as a consequence of (7 3 4) 

(10) L*(x) = max mm L(f (x) , /?) 

0 m 

= max [li^) - v(F(x) 1 13)] 

< max [l(p) — viF \ 13)] 

/s 

= max mm L(f ; /S) = L*, 

/s f 

It is clear that the maximin /3’s for the basic and derived problems are 
the P’s that maximize the concave functions 

(11) hi^) = Df m - v(F 1 /8) = m - m 
and 

(12) h(^; x) = Df m - vim) 1 fi) = m) - mim) 1 ^), 

respectively The search for mmimax f(x)’s, for example, is greatly 
narrowed by the consideration that, if f(x) is mmimax, Eif{x) j /3) = 
!;(F(x) 1 0) for some 0, mdeed for every maximin 0 Accordmg to § 7 3, 
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equality obtains in (10), if and only if there is a maximin /3o of the 
basic problem such that 


is also a maximin of the basic problem for every x such that 
. SP(a: 1 BMj) > 0 


The most typical possibility, and the only one to be explored here, is 
that the basic problem has a unique maximin /3o with ^o(j) > 0 for all 
j. Under this assumption, Z/*(x) = L* if and only if x is utterly ir- 
relevant, as is easily shown 

Xn the same spirit, as can easily be shown, L*(x) = 0, if x is defini- 
tive, but not typically otherwise, and, if x extends y, then L(x) > L^j) 
with equality if, and typically only if, y is sufficient for x. 


3 Sufficient statistics 

Digressing from the minimax rule for a moment, something more fun- 
damental can be said about a sufficient statistic y of x Namely, for 
every f(x)8F(x), there exists an f(y)8F(y) such that /(f(y), ^) = 
J(f(x); t) for every t. Indeed f(y) == j y) defines such an 

act. Without appeal to so weak a step as the mimmax rule, this re- 
mark demonstrates that even an objectivist loses nothing by exchang- 
ing knowledge of an observation for knowledge of a sufficient statistic 
of it. The remark might as well have been expressed in § 7 4, except 
that there it would have mvolved some circumlocution, mixed acts not 
yet having been introduced 

4 Simple dichotomy, an example 

Much of what has been said thus far is well illustrated by the mini- 
max counterpart of Exercise 7.5 2 The reader is accordingly asked to 
review that exercise and continue it thus: 


Exercises 

1. For the problem in question* 

(a) 7i(/3) = 32/3(1) + hm - I 5i/3(2) - 32/3(1) | 

(b) h(p,x) 

= 32/3(1) + dim - Z I 5ir2/S(2) - I jZ J°(H 5.)} 

= 32[2P(ri < ri*(p, ^o)| Pi) + P(r = /3o) | Pi)]/3(1) 

+ 5i[2P(r2 < r2*(p, M \ P 2 ) + P(r = r*(ft M I P2)]/3(2). 
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2a A /3 IS maxiiniii, if and only if r*(j8, (Sq) is such that 

(1) SaPCri < ^o) 1 -Bi) < 3iP(r2 < ri*(/S, ^o) 1 B 2 ) 

and 

(2) 52P(i-i < fio) 1 Bi) > S,P(r2 < /3o) 1 B 2 ) 

2b There is typically only one maxinam, but there may be a closed 
interval of them ’ 

3 Though the acts of F and F(x) as defined by Exercise 7 5 2 do not 
provide for mixed acts, it will suffice to consider muxtures of the f(x)'s 
Each of these will be determined by an i, and nothmg will be lost by 
requiring i to be of the form i(r(x)), 

4a Any mmimax will be equivalent to a mixture of f(x)'s each corre- 
spondmg to a likelihood-ratio test associated with fio) foi every 
maximm /3 

4b, In view of Exercise 3, the only hkehhood-ratio tests that need 
be considered for a mmimax /3 are 

z(r) = 1, if and only if ri < ri*(j5, j3o) 
t(r) = 1, if and only if ri < /3o) 

These are not necessarily different tests 

5a If the maximin j3 is unique, the mimmax act is unique (except 
possibly foi equivalent acts) and is a mixture of exactly two f (x)’s corre- 
sponding to the two likelihood-ratio tests defined in Exeicise 4b 
This conclusion calls for some comment, for, m ordmary statistical 
practice, one or the other of the extieme hkehhood-ratio tests is used, 
never a mixture This practice is not in serious conflict with the mmi- 
max rule, because the maximum loss associated with either extreme is 
typically only slightly greater than L*(x). Moreover, vagueness about 
the exact magmtude of 5i and §2 would usually frustrate any attempt 
to calculate the coefficients of the mixture. Incidentally, mixture is 
not called for at all when r is contmuously distributed, for h(/S^ x) is 
then smooth rather than polygonal, that is, if F(r = r' [ B^) = 0 for 
every r' and both ^'s, then x) has a contmuous first derivative in /3. 
To show this and to show that the derivative is hP{ri < ri* j Bi) — 
hP {^2 <^ 2*1 P 2 ) i^ay be taken as an exercise only shghtly beyond the 
usual mathematical level of this book. 

5b If there is more than one maximm 13, then any one that is not 
extreme has only one likelihood-ratio test associated with it, and the 
same one for aU The f (x) correspondmg to that test is essentially the 
only mmimax 
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5 The approach to certainty 

In concluding the paraphrase of § 7 1-6 that has thus far been the 
subject of the present chapter, it should be mentioned that the approach 
to certainty studied in § 7.6 obviously implies that the corresponding 
L*(x(n)) approaches zero with increasing n 

6 Cost of observation 

A cost c associated with an objectivistic observational problem di- 
minishes the income by E(c | B^) for each regardless of f , that is, al- 
lowing for the cost, 7(f, t) = E(f — c 1 5^) But the cost, being un- 
avoidable, does not affect the loss function, so the minimax problem 
associated with the observation is independent of the cost. The costs 
do intervene, however, in an essential way in the problem of deciding 
which to choose of several available observations, say Xa at cost Ca,'’it 
is important to bear in mind in connection with this problem that a null 
observation at zero cost is typically among the choices available m real 
life The generic act of this compound problem can conveniently be 
symbolized by SX(a)f(Xa), or sometimes simply by X Here, of course, 
X(a) > 0, SX(a) = 1, for choice of X means choice, for each a, of the 
probability X(a) that the ath observation Xa will be made and also choice 
of the derived act f (x^) to be adopted m case Xa is made It is intuitively 
evident, and follows easily from (1) below, that the mixture of several 
X’s IS also a X as far as income is concerned, so mixtures of X^s do not 
require explicit consideration The income function can be written 

(1) /(X, i) = Z\(a)E(f(Xa) - c„ 1 50 
Whence 

(2) max I(X, z) = max jE^(f | — mm E(ca 1 B^) 

X f a 

The loss function is accordingly 

(3) L(\, /3) = E X(a) {La(f(x.) , /3) + 4(/3) } , 

a 

where 

(4) da(/3) = Df E {^(Ca I 50 - mm 5(c„, | 50 

t a' 

and La(f(Xa); /5) is the loss function of the observational problem de- 
rived from the ath observation 

The compound minimax problem is intimately related to the concave 
functions x^) and the linear functions da(/3), as is explained by the 
following exercises 
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Exercises 

1 Show that 

(5) h(l3) = Df min L(\; /3) = mm [h(0, x^) + da(^)] 

X a 

2 If X = 1 f'(Xa/), then L(\, 0) = h\(0); if and only if first, 

( 6 ) La>{t'{Xa’),0) =^h{^,Xar) 

(m which case i'{Xa') will be called well adapted to Xa' and 0) , and, seconcf, 

(7) Hfi] Xo') + da'ifi) = mm {h{0, Xa) + da{0)] 

a 

(in which case Xa/ will be called well adapted to /3) 

3a Show that 

(8i Lx* = Df mm max L(X; 0) == max h\{0) 

X /S /5 

< mm max [h{0] Xa) + 4(/3)] 

3b Under the important special condition that the da(/3) are equal 
to constants da, (8) specializes to 

(9) Lx* < mm [L*(xa) + da]. 

a 

3c When can equality hold in (8) and (9)? 

3d IS maximm, if and only if Ax(/30 = Lx* 

4 A X = '2\(a)f(Xa) IS minimax, if and only if 

(a) For every a for which X(a) > 0, Xa is well adapted to every maxi- 
min and f (Xa) is well adapted to Xa and every maximin /5 

(/?) L(X; t) < Lx* for every z (Of course 0) is alone necessary and 
sufficient; the point of the exercise is that the necessary condition (a) 
may conveniently confine the search for minimax X^s to relatively few 
candidates ) 

5 Suppose that (a) r and z are confined to the values 1 and 2, and 
L(fr, z) — 1 r ^ I , (/3) X IS confined to the values 1 and 2, and P(1 j Bi) 
— 1/2, P(i I B2) = 1/4, (7) a is confined to the values 1 and 2, and the 
X's of the compound problem attach weight X(l) to a basic act at zero 
cost and X(2) to an act derived from x at a non-negative constant cost 
d Compute and graph. k(/3), h(P; x), and (for various values of d) 

Graph Lx* as a function of d, and discuss the minimax X’s for 
various values of d 

7 Sequential probability ratio procedures 

The type of decision problem that in § 7 7 led to the concept of a 
sequential probability ratio procedure has an intimate counterpart in 
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an important type of compound objectivistic decision problem, for 
which the concept was in fact originally developed by Wald [W2] 
The XaS of a problem of this type range over the enormous variety of 
sequential observational programs associated with a sequence of (con- 
ditionally) identically distributed random variables x(l), x(2), •••. 
The technical assumption that the a’s have a finite range is not fulfilled, 
but, as in § 7 7, I proceed with some lapse of rigor, referring to Waldos 
book [W3] or [A7] for the full details Exercise 6 4 shows that atten- 
tion may be confined to a’s that are well adapted to at least one /?, and 
that for those a^s it may be confined to f(Xa)’s that are well adapted to 
Xa and the corresponding p The way is paved by § 7 7, which states 
sharply restrictive properties of the XaS and f(Xa)’s that are so adapted 
In some cases, recognition of these properties contributes greatly to the 
possibility of actually computing mimmax, or nearly minimax, pp- 
cedures for sequential problems 

8 Randomization 

Another important type of compound problems is illustrated by the 
second example of § 9 6 A generalization of part of that example is 
presented here to show how the mimmax rule explains, or implies, the 
process called randomization, which is one of the most striking features 
of modern statistics, and one long antedating the mimmax rule Ran- 
domization represents the only important use of mixed acts that has 
thus far found favor with practicing statisticians, as will be discussed 
in the next section The exact meaning of randomization seems a little 
elusive, no sharp definition is attempted here But, roughly, random- 
ization is the selection of an observation at random, that is, of a X 
with more than one X(a) actually positive, the choice of the X(a)^s and 
of the derived acts being governed largely by symmetry The follow- 
ing example provides at least a fairly general illustration of the concept 

To set the stage and provide motivation for a formal statement, the 
example will first be stated in language that is suggestive though a 
little vague The consequences of the basic acts in the example de- 
pend on the composition of a population of n objects, which may be 
thought of as numbered from 1 through n It may be known of some 
compositions that they cannot occur, but, if a composition is considered 
possible, all populations having that composition (irrespective of order- 
ing) are also considered possible. Each observation in the compound 
problem consists in the cost-free observation of some m of the objects, 
every subset of exactly m objects bemg available for observation 

Formally, the index i of the partition runs over a certam set I of 
n-tuples, {^ 1 , •, of elements considered for definiteness to be in- 
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tegers If z = {ti, • , s J, then any permutation Ti of i is also in 
I. It is assumed that 

(1) E{t 1 S,) = E{f 1 BtO 

for everj^ f s F, ^ e /, and pei mutation T. 

To every subset A of m integers, 1 < ai(A) < a 2 (A) < • • < chn-i(A) 
< am(A) < Tij there coriesponds an observation x(A) the possible val- 
ues of which aie m-tuples • , Xm(A)} The conditional dis*- 

tributions of the a.(A)'s are defined thus If Xi{A) = etc , then 

P{xi(A), • , Xm(A) 1 = 1 

It IS obvious that L*(x(A)) is the same for every A In typical ap- 
plications this common value is little, if at all, less than L*. 

If a compound act SX(A)f(x(^)) is to be chosen, statistical common 
seiffee asserts that nothing is to be lost by 

(a) Letting X(A) be independent of A, and therefore equal to 

for every A , that is, letting every sample of size m have the same prob- 
ability of being chosen, or randomizing, as it is said 

(b) Letting f(zi{A)j • •, Xm(A)) be symmetric in its m arguments 
and independent of A 

It can in fact be shown, by the method illustrated in the second ex- 
ample of § 9 6 and discussed more generally in § 12 5, that there is at 
least one minimax satisfying (a) and (b), and even that there is an ad- 
missible one Typically, if m is laige, but small compaied to n, Lx* 

15 much smaller than the common value of the L*(x(JL))^s 

The importance of randomization m applied statistics can scarcely 
be exaggerated From the personalistic viewpoint it is one of the most 
important ways to bring groups of people into virtual unanimity, from 
the objectivistic viewpoint it not only makes possible great reductions 
in maximum loss, but it is seen as an mvention by which the theory of 
probability is brought to bear on situations to which probability on 
first (objectivistic) sight would seem irrelevant. 

9 Mixed acts in statistics 

Many have commented that modern applied statistics makes one, 
but only one, important use of mixed acts, namely m decidmg, through 
the process of randomization, what to observe Thus, for example, 
once the observation has been made, the derived act is in practice al- 
most always chosen, without mixmg, from a set of basic acts natural to 
the problem This might seem to imply a sharp conflict between the 
mimmax lule and ordinary statistical practice, but actually it reflects 
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agreement, for mixed acts greatly reduce the mimmax loss m decision- 
problem interpretations of typical practical statistical situations, when 
and only when ordinary practice calls for mixed acts of the same sort, 
namely when randomization is called for 

There aie certain mechanisms that systematically tend to make mixed 
acts have i datively little, or even absolutely no, advantage over un- 
mixed acts In the following discussion of these mechanisms, let L(r, i) 
foe the abstract game on which a bilinear game L(f , g) is based. 

In the first place, supposing that L(r, i) is non-negative for every r 
and % (as is appropriate to the context now at hand), (12 3 6) can be 
completed, so to speak, thus 

(1) mm (Rj I) > mm max L(r, i), 

rt r % 

r 

where R and I denote for the moment the number of values of r and i, 
respectively, and min {R, I) is of course the minimum of the two inte- 
gers R and I An inequality stronger than (1) will actually be proved. 

Consider a mmimax f for which the smallest possible number E' of 
the/(r)’s are actually positive. 

(2) jB'L* = max R' ^ L{t] i)f(r) 

% f 

> max L(r'; ^) 

% 

> min max L(r, %) 

r % 

where r' is so chosen that RJ(r^) > 1, as can obviously be done It is 
known [B19] that R' < mm (J?, I) 

The important lesson of (1) is that, unless R and I are both large, 
the introduction of mixed acts cannot reduce the mmimax loss to a 
very small fraction of the value it would otherwise have 
To mention a different mechanism, Figure 12 4 1 suggests that, if 
there are many r^s, the corners of the concave function emphasized m 
that figure may well be very blunt, in which case a mmimax mixed act 
has almost as high a maximum loss as any one of its components. When 
the number of r^s is infinite, the concave function may well be differen- 
tiable, m which case mixed acts have absolutely no advantage The 
remark appended to Exercise 4 5a is pertinent here 
This mechanism can be related to a ceitam large class of infinite ab- 
stract (i e , not necessarily bilinear) games, discovered by Kakutam 
[Kl], for which L* = Bilinear games are but a special case of 
these, and numerous others seem to arise frequently in applications 
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If L* = Z/H: for an abstract game, nothing at all can be gained by ad- 
joining mixed acts, as (12 3 5) shows 

Finally, it may be mentioned that m many cases where an observa- 
tion X might be followed by a mixed deiived act, the same, or nearly 
the same, consequences can often be realized by a pure act Speaking 
a little loosely, this occurs whenever x has a contmuous or nearly con- 
tinuous contraction y that is irrelevant, or nearly irrelevant, for then 
y can play the role in selecting a basic derived act that would otherwise 
be assigned to a table of random numbers. If, for example, x is con- 
tinuous, y(x) can be taken as the last few digits m the decimal expansion 
of X to an extravagant number of places Again if, conditionally, x = 
{xi, * • *, Xn} IS an n-tuple of continuously, identically, and independ- 
ently distributed real random variables, y(x) may be taken as the per- 
m^itation that ranks the a;’s in ascendmg order, pro\aded that is 
fairly large: 10^ should satisfy almost any need 
A recent technical reference on the superfluousness of mixed acts m 
the presence of continuous observations is [D13] 

I have occasionally heard it conjectured that any mixed act made 
after the observation (m an observational decision problem) is wrong in 
principle I would argue that the conjecture is mistaken thus Any ob- 
servational problem that calls for randomization can be simulated, so 
far as its loss function L(r,i) is concerned, by a basic problem A mnxed 
act will be as appropriate to the basic problem as it was to the obser- 
vational problem from which the basic one was derived In this way a 
great variety of situations calling for mixed acts having nothmg to do 
with choice of observation can be constructed, though they seem to be 
atypical m practice Moreover, any basic problem can obviously oc- 
cur as the decision problem remaining after some particular value x of 
an observation has been observed, so the situations just constructed 
lead to closely related ones calling for mixed acts after observation 
Less abstractly, consider a person choosing from a tray of assorted 
French pastries Even after extensive visual observation and mterro- 
gation of the waiter, the person might justifiably introduce considera- 
ble mixture mto his choice 

I think that the conjecture that mixed acts are necessarily inap- 
propriate after observations stems partly from the mechamsms that do 
tend to make such acts mappropriate or unimportant m many typical 
cases and partly from justifiable dissatisfaction with specific mixed acts 
that have from tune to time been suggested by statisticians. For ex- 
ample, the suggestion that ties in rank arising in non-parametric tests 
be removed by ranking the tied observations at random may in many, 
or perhaps all, cases fairly be regarded with suspicion. 



CHAPTER 15 


Point Estimation 

1 Introduction 

This chapter discusses point estimation, and the next two discuss the 
tegtmg of hypotheses and mterval estimation, respectively Definitions 
of these processes must be sought in due course, but, for the moment, 
whatever notions about them you happen to have will afford sufficient 
background for certain introductory remarks applying equally well to 
both kinds of estimation and to testing 
Estimating and testmg have been, and ineitia alone would insure 
that they will long continue to be, cornerstones of piactical statistics 
Their development has until recently been almost exclusively in the 
verbalistic tradition, or outlook For example, testing and interval 
estimation have often been expressed as problems of making assertions, 
on the basis of evidence, according to systems that lead, with high prob- 
ability, to true assertions, and point estimation has even been decried 
as lU-conceived because it is not so expressible 
Waldos minunax theory has, as was explained in § 9 2, stimulated in- 
terest in the interpretation of problems of estimation and testing in be- 
havioralistic terms, to objectivists this has, of course, meant interpre- 
tation as objectivistic decision problems For reasons discussed in 
§ 9 2, it does seem to me that any verbalistic concept in statistics owes 
whatever value it may have to the possibility of one or more behavioral- 
istic mterpretations 

The task of any such interpretation from one framework of ideas to 
another is necessarily delicate In the present instance, there is a par- 
ticular temptation to force the interpretation, namely, so that criteria 
proposed by the verbalistic outlook are translated mto applications of 
the minimax theory, that is, of the minunax rule and the sure-thing 
principle (as expressed by the criterion of admissibility), for these are 
the only general criteria thus far proposed and seriously maintained 
for the solution of objectivistic decision problems Of course it is to 
be expected, and I hope later sections of this chapter and the next dem- 
onstrate, that unforced mterpretations do often translate veibalistic 

220 
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criteria into applications of the behavioralistic ones In evaluating any 
such interpretations, it must be borne in mind that an analogy of great 
mathematical value may be valueless as an interpretation; correspond- 
ingly, what IS put forward as mere analogy should not be taken to be 
an interpretation, much less branded as a forced one For example, 
attention has already been called (in § 11 4) to the danger of regarding 
the analogy between the theory of two-person games and that of the 
minimax rule for objectivistic decision problems as an interpretatioE^ 
In fact, mimmax problems are of such mathematical generahty that 
they arise, even within statistics, m contexts other than direct applica- 
tion of the mimmax rule to objectivistic decision problems, a striking, 
though technical, example is Theorem 2 26 of Wald’s book [W3] 

The literature of estimation and testing is vast, mdeed it has^ I 
thmlc, been seriously contended that statistics treats of no other sub- 
jects This chapter and the next two cannot, therefore, pretend to 
present a complete digest of that literature, even so far as it pertains to 
the foundations of statistics For further reading certam chapters of 
Kendall’s treatise [K2] may be recommended as a key reference to the 
verbahstic tradition (Chapters 17 and 18 for point estimation, 19 and 
20 for interval estimation, 21, 26, and 27 for testing). Many newer 
aspects are treated in Wald’s book [W3], and a recent review of testing 
by Lehmann [L4] is lecommended 

2 The verbalistic concept of point estimation 

Abstractly and very generally, but in verbalistic language (which is 
necessarily vague), the problem of point estimation is this: Knowmg 
P{x 1 jBJ for every ^ and having observed the value x, guess the value 
X of a prescribed function, or parameter as it is often called, X(z) with 
values in a set A Semi-behavioralistically this is, I think umversally, 
understood to mean that a function 1 associating a value l{x) e A with 
each X (or possibly a mixture of such functions) is to be decided on, the 
function 1 bemg called an estimate (or, to be complete, a point esti- 
mate) of the parameter X A problem of point estimation has, thus, 
some of the structure of an objectivistic observational problem; but, 
since nothing has yet been said about the mcome, or consequence, re- 
sulting from the act I m case obtains, it is at the moment impossible 
to advance criteria for the choice of 1 

3 Examples of problems of point estimation 

It will now be well to present some examples after a few" words of 
preparation For simplicity, A will henceforth generally be supposed 
to be an interval (possibly unbounded) of real numbers If X(z) = 
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X(^0 implies ^ = e'j then X rather than i can be used to index the p 
tition; such an estimation problem is said to be free of niiisance para 
eters. This usage corresponds to the fact that the can typically 
represented as ordered couples (X, 6), where X is of course \{i) and i 
called the nuisance parameter; if ^ m turn happens to be represen 
as an ordered n-tuple, ordinary usage calls 6 an n-tuple of nuisaj 
parameters. It must be recogmzed as atypical in estimation proble 
for ^ or X to be confined to a finite set of values, and often x is not 
confined either It will therefore be necessary to proceed heuristics 
into domains where the mathematically limited theory developed 
this book does not rigorously apply 

The specific estimation problems most commonly cited as examp 
and most important m practice, are summarized m Table 1, toget 
with their maximum-likelihood estimates, that is, estimates construe 
m accordance with a rule to be defined in § 4 AJl but the last two 
amples of Table 1 are free of nuisance parameters 

4 Criteria that have been proposed for point estimates 

As a matter of fact, verbahstic treatments typically do give so 
inkling of the consequence of the act I when obtains Thus, m 
examples commonly cited, such as those in Table 3 1, A is a set of i 
numbers or a set of n-tuples of real numbers and, therefore, a set 
objects between which the notion of proximity has some meani 
Work in the verbahstic tradition has made it clear in connection w 
such examples that, if Z — X(^) for the that obtains, the guesj 
considered perfect and that, roughly speakmg, it is considered rat 
poor if I IS far from X 

In spite of the apparently hopeless indefimteness of estimation pr 
lems even as thus formulated, various criteria, or desiderata, for e 
mates have been suggested A list of these criteria, intended to be 
sentially complete, is now presented Each item is annotated and 
lustrated to make its meaning clear, and sometimes to call attent 
to related criteria not explicitly hsted, motivation and criticism i 
however, deferred until later sections, where they are treated m com 
tion with explicit hypotheses about the consequences of misestimati 

No attempt is made to include criteria like intellectual simplicity 
facility of computation that depend not only on the estimate but i 
on the capabilities of the people who contemplate usmg it The 
is in a sense logically inhomogeneous For example, no one really c 
siders it a virtue in itself for an estimate to be a maximum-likehh 
estimate (Criterion 4); rather, it is believed that such estimates 
typically have real virtues 
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It has, to begin the list of criteria, been suggested by one person or 
another that 

1 If y is sufficient, nothing is to be lost by requiring the estimate 1 
to be a contraction of y 

It will be instinctive to bear m mind that necessary and sufficient 
statistics of the examples (a)-(f) in Table 3 1 are, respectively, x, x, 
X, 2 (5, a:^), {x, X) 

2 If, of two estimates 1 and 1', 

(1) E{[\ - \(z)f 1 SO < E([\' - X(i)]2 1 SO 

foi; every with strict inequality for some i, then 1 is better than 1', 

Theie are countless variants of this idea In particular, the square 
of the difference may be replaced by any other positive power of the 
absolute difference Again, (1) may be imposed at only one value of 
if 1 and V are subjected to some other condition, fieedom from bias 
(Criterion 6 below) being the popular one. 

Example (f) gives rise to a good illustiation of this criterion, which 
is also interesting in a later connection Letting Q 
it is well known that E(Q | /x, a^) — (n — l)a^ and that E{Q^ | fi, 

= (n^ — 1)(7^. Therefore 

(2) E([aQ - 1 /X, - 1) - 2a{n - 1) + l}cx^ 

= \(a {71^ - 1) + xr" 

l\ n+1/^ n+l\ 

2<x^ 

> 

^ + 1 

for all real a, with equality if and only if a = (^^ + 1)™^, omitting the 
pathological but trivial case that n = 1 By the criterion in question, 
Q/(n + 1) IS therefore better than any other estimate of the form aQ, 
including the maximum-likelihood estimate Q/n and the unbiased es- 
timate Q/ (n — 1) 

3 If, of two estimates 1 and 1', 

(3) P(-6l < l{x) - < €2 1 B,) > P(-€l < l'(x) - \{l) < 62 1 B,) 

for every non-negative ei and €2 and for every ^, with strict inequality 
for some €1, 62, and some z, then 1 is better than V 
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Acceptance of this criterion is obviously implied by acceptance of 
Criterion 2, of which it may therefore be regarded as a skeptical coun- 
terpart, formal demonstration of a much more general assertion will be 
given m connection with (5 2 — 4) The criterion implies, for example, in 
connection with (c) of Table 3 1 that x is superior to any other weighted 
average of the a/s A more mteresting example wiU be mentioned in 
connection with Criterion 5 

That modification of Criterion 3 m w^hich it is concluded only tha-t 
1 IS at least as good as 1' is of some technical interest Incidentally, if 
equality held identically in (3), there would presumably be nothmg to 
choose between the two estimates by any reasonable criterion, for they 
would then both have the same system of conditional distributions 

4 A maximum-likelihood estimate is often a rather good estimate! 

A maximum-likelihood estimate is an estimate 1 such that, for some 
function i of x, l(x) = X(z(x)) and 

(4) P(n 1 S.w) > P(x 1 B,) 

for every % and x In many natural problems there is only one maxi- 
mum-likelihood estimate Taking into account the analogy between 
probabilities and values of probability densities, the reader should verify 
that the estimates listed in Table 3 1 are mdeed the unique maximum- 
likelihood estimates of the problems to which they refer When there 
IS a unique maximum-likelihood estimate, it is obviously a contraction 
of the likelihood ratios and, therefore, of any sufficient statistic, which 
fits neatly with Criterion 1 

5 A good estimate should have the same symmetry as the problem 

More precisely, if a permutation T of the z^s and the re's is such that 

(5) P{Tx 1 BT^) = P(x 1 B,), 

and such that X(^) = X(t') implies HTi) = MTi'), then 1 should be 
such that, if l(x) = X(z), l(Tx) = \iTi) 

For example, adopting also Criterion 1, a good estimate for /x in (c) 
may be sought of the form lix) Symmetry then dictates l(x + a) = 
l{x) + a and l(—x) = —l{x),m short, Kx) = x 

The same conclusion can be drawn for (e), though with a little more 
trouble The ciiterion applied to (f) leads to estimates of the form aQ. 
The constant a might be fixed by appealmg, for example, to Critenon 
2, 4, or 6 These alone give three slightly different determinations — 
= (n + 1), n, and (n - 1), respectively 
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Again, it can be shown for Examples (c) and (e) that, among all es- 
timates satisfying Criterion 5, x is best according to Criterion 3 

6. It is desirable that the estimate be unbiased 

An estimate 1 is called unbiased, if and only if 

(6) EQ. 1 B,) = \(i) 

f6r every i 

It IS easy to verify that the maximum-likelihood estimates of (a) -(e) 
in Table 3 1 are all unbiased, that of (f), however, is not, for E{Q/n [ jl 4, 
cr^) = (1 — l/n)^^ instead of Again, if 1 is a maximum-likelihood 
estimate of X, ^ is a maximum-likelihood estimate of ^ But, if 1 is 
not^ definitive, and 1 is an unbiased estimate of X, ^ is not an unbiased 
estimate of 6^, as Theorem 1 of Appendix 2 implies 

7. If P(1 1 — \{%) 1 < I 1' — X(^) 1 [ BO > 1/2 for every then 1 is 
better than 1'. 

Any resemblance between this criterion and Criterion 3 seems to be 
dispelled by the following example Suppose that, for every ^, B(1 — \{i) 
= a, 1' — X(z) = & 1 BO equals 2/11 if a and h are integers such that 
0 < a < 6 < 2, equals 5/11 if a and h are 2 and 0 respectively, and 
equals 0 otherwise According to Criterion 7, 1 is better than 1', be- 
cause 6/11 > 1/2, but, accoidmg to Criterion 3, 1' is better than 1, 
because 5/11 > 4/11 and 7/11 > 6/11 The example can easily be 
modified to suit any taste for symmetry and continuity But, if 1 and 
1' are conditionally independent (which is not a natural assumption), 
and 1 is better than T according to Criterion 7, then, as may easily be 
shown, 1' cannot be better than 1 by Criterion 3 

The list of criteria is here interrupted by several paragraphs of ex- 
planation in preparation for two concluding criteria 

The approach to certamty treated in §§3 6 and 7 6 has its counter- 
part in the theory of estimation In particular, if x{n) = [xi, • , x^} 

IS an n-tuple of conditionally independent and identically distributed 
observations, there will typically exist sequences of estimates l(n) based 
on x(n), such that 

(7) lim B(| l{x{n), n) — \{%) | < e | BO = 1 

n — > =0 

for every positive e and every i A sequence of estimates satisfying (7) 
relative to any sequence of observations x(n) (not necessarily n-tuples 
of conditionally independent observations) is called consistent. 
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The condition of consistency is often realized in a ver^^ special way, 
namely that the error n) — X(^)] is, for every and for large 

practically normally distributed about zero with vanance mversely 
proportional to n. More foimally, a sequence of estimates may be 
such that 


(8) 


hm P 

n — > 00 


/ n^[l{xin) , n) - X(^)] 

V c{^) 



1 

(27r)-^ 



for every t and a, where <j{i) is some positive function of it is then 
said that n^[Z(a;(n) , n) — X(z)] is asymptotically normal about zero with 
as3mptotic variance o^{i) If, m addition, for every z, o'’^(z) is not less 
than a certain function, the differential information, to be defined in 
§ 6, then the sequence \n is called efficient. 

There is a possible pitfall in connection with the idea of asymptotic 
normality Though (8) implies that, for large n, the distribution of 
the error is, in a sense, almost the normal distribution with zero mean 
and variance (x^(i)fn, it does not imply that the mean of the error is 
close to zero, or even finite or well defined. Similarly, the variance of 
the error may be much larger than <x^{z)Jn, infinite, or ill defined, but 
it cannot, for large n, be smaller than by a fixed fraction or less. 

Much literature on estimation has concentrated on sequences of es- 
timation problems m which x{n) is an n-tuple consistmg of the first n 
elements of an infinite sequence of conditionally mdependent and con- 
ditionally identically distributed random variables or, as it will be 
called m the present chapter, a standard sequence; because these are 
the simplest examples of sequences of increasingly informative obser- 
vations. Examples (c)~(f) in Table 3 1 refer directly to standard se- 
quences, the binomial distributions (a) can be regarded as the distri- 
bution of the sufficient statistic of the standard sequence x{n) 

in which each x^ takes the values 1 and 0 with probabilities p and I — v, 
respectively (cf Exercise 7 4 1); again, if each x^ is Poisson-distnbuted 
with parameter ju, then X) sufficient for x{n) and is itseff Poisson- 
distributed with parameter nix Thus, all the examples in Table 3 1 
give rise more or less directly to examples of standard sequences 

In speakmg of standard, and occasionally of other, sequences the 
ellipsis of referring to a sequence of estimates simply as “an estimate” 
has been widely adopted, so one reads recommendations that “an es- 
timate” should be consistent or efficient This ellipsis, though often 
convement, sometimes proves dangerous It distracts from the fact 
that a person is called upon to make an estimate, not a sequence of es- 
timates; so that the question of what constitutes a good sequence does 
not arise Agam, it makes one feel that if an estimate, say I13, has been 
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dej&ned for x(13), then the definition of lu is thereby implied One for- 
gets, for example, that ‘^the^’ average of n observations is a whole se- 
quence of statistics, a sequence singled out by human tastes and in- 
terests, rather than by any mathematical necessity In short, the 
ellipsis establishes the atmosphere of the logically nonsensical (though 
perhaps psychologically revealing) questions on intelligence tests such as 

^^What are the two missing terms in the sequence 1828182 8*^^’^ f 

r The recommendations of consistency and efficiency quoted above can 
be added to the numbered list of suggestions, in a form that avoids the 
ellipsis . 

8 If each l(n) is a good estimate for the corresponding x(w) of a 
standard sequence, then the sequence l(n) is consistent 

l"he sequence of maximum-likelihood estimates of the sequences r of 
problems (a), (c)-(f) are consistent, and, for the sequence of problems 
of estimating from an observation Poisson-disiributed with parame- 
ter 7i{ij the maximum-hkelihood estimates y^/n are consistent 

If there is one consistent sequence of estimates, for a sequence of 
problems there is a plethora Each term of a consistent sequence can, 
for example, be multiplied by (1 + without destroying consist- 
ency. Again, the sample medians J are in (c) a consistent sequence 
different from the sequence of maximum-likelihood estimates 

9. Under the hypothesis of Criterion 8, the sequence l(n) is efficient, 
at least if any efficient sequence of estimates exists 

The SIX sequences of maximum-likelihood estimates mentioned under 
Criterion 8 are all well known to be efficient, as sequences of maximum- 
likehhood estimates for standard sequences typically are The asymp- 
totic variances and certain other interesting quantities associated with 
these SIX sequences are presented in Table 1 It is lemarkable that, 
for each of the examples in Table 1, the expected values of the estimates 
approach the estimated parameter, n times the variance of the esti- 
mate, and n times the expected squared error, both approach the asymp- 
totic variance of times the error For the first five examples the 
relations mentioned hold, indeed, not only m the limit, but exactly, 
for all n All six examples are rather special, or magical, but the limit- 
ing relations ]ust mentioned may fairly be expected to hold in some 
generality, though they are not (as has already been mentioned) really 
implied by the asymptotic normality of the sequence of errors times 
To illustrate the exceptions that can occur, \x\~^ is, in (c), the 

^ e — 2 7182818285 to eleven significant figures 

t See any statistics text for definition, if necessary 
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maximum-likeliliood estimate of j /i 1 ""^ for jLi 7 *^ 0 , this sequence of es- 
timates IS efficient, and :f 1“^ — 1 M 1~^) is asymptotically normal 
about zero with asymptotic vaiiance but the other three entries 
for Table 1 are mfinite in this example 

Table 1 Examples of behavior of maximum-likelihood estimates 


Sequence 

Mean 

n X variance 

n X expected 
square of 
eiror 

Asymp- 

totic 

variance 
of X 

error 

(a) 

V 

pq 

pq 

P? ^ 

* Poisson pn 


M 


M 

(c) 

M 

1 

1 

1 

(d) 

0 

cr“ 

2o-^ 

2(7^ 

2(7^ 

(e) 


cr^ 

*> 


(f) 

0 
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As in the case of consistency, where there is one efficient sequence, 
there are many, but efficiency is, of course, a much more restrictive 
property than consistency. For example, multiplication by (1 + 
typically destroys efficiency, though multiplication by (1 + n~^) never 
does Again, the consistent sequence of medians mentioned under Cri- 
terion 8 IS not efficient Indeed, it is well known of that sequence that 
the sequence of errors times is asymptotically normal about zero 
with asymptotic variance 7r/2 rather than 1 

5 A behavioralistic review of the criteria for point estimation 

It IS time now to introduce the notion of consequences, or (equiva- 
lently, I believe) of loss, thereby interpreting estimation problems as 
decision problems Let it be said then that an esUmahon decision prob- 
lem IS an observational decision problem with the following distinguish- 
ing feature There is a one-to-one correspondence between the basic 
acts f and the values attamed by a real-valued function X(^), such that 
L(f, i) = 0, if f IS the act that corresponds with \{i). It is sunpler, 
more suggestive, and harmless to let the number I that corresponds to 
f replace f itself in all further discussion of estimation decision problems 
To illustrate the new notation, it may be said that L(Z, ^) = 0, if Z = X(z) 

I believe that any situation ordmarily said to call for (pomt) estima- 
tion can be analyzed as an estimation decision problem For example. 
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estimating how much paint will cover a wall may, depending on cir- 
cumstances, mean deciding how much paint to buy, what to bid for a 
contract, or what number to enter in a guessing pool Under each of 
those interpretations there will be zero loss, if and, typically, only if 
the estimate is ^^correct,” as one says 

The consequences of an estimate may, like those of many real life 
decisions, be difficult to appraise It is hard to say even in i datively 
’iioncrete situations what it will cost to misestimate the speed of light, 
a particular mortality rate, or the national income If, to revert to an 
example already discussed, the estimate is to be published somewhere 
for the use of whoever has a use for it, the consequences of publication 
may seem beyond all reckoning. None the less, I reaffirm the convic- 
tign that the concept of consequence measured in income or loss is 
valuable in dealing with such situations, as I hope the present treat- 
ment of estimation will illustrate Incidentally, it seems indifferent, 
as I have aheady said, whether loss or income is taken as the starting 
point. It is easily shown that the decisions of the idealized person of 
the peisonalistic probability theory will be the same in two problems 
having possibly different income, but the same loss, functions This 
feature I would expect to be acceptable even to objectivists, and I 
also think it appropriate to theories of group decision 

I know of nothing interesting that distinguishes estimation decision 
problems as a class from observational decision problems generally 
But actual estimation situations suggest certain relatively wide classes 
of estimation decision problems about which interesting and valuable 
conclusions can be drawn Indeed, it will be shown in this and the next 
two sections that seven of the nine listed criteria for estimation can be 
justified to some extent as flowing from application of the principle of 
admissibility and the minimax rule to such classes of estimation de- 
cision problems 

Before making any real specialization, it may be most systematic to 
mention that Criterion 1 is simply an instance of the general principle, 
which we have now studied from several points of view, that nothing 
is lost by confining attention to sufficient statistics, at least if mixtures 
are allowed. 

It is clear in almost any estimation situation, even in those for which 
the notion of loss is vaguest, that if two errors have the same sign the 
larger entails at least as great a loss as the smaller Analytically, 

(1) L{1]%) <L{V)%) 

for X(^) <l <V and for X(^) >1>V* Situations to which (1) fails 
to apply can readily be imagined. William Tell, for example, in esti- 
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mating the angle by which to elevate his cross-bow for the apple shot 
might have preferred a downward error of 10*^ to one of T ; but such 
circumstances seem exceptional Furthermore, it is usually justifiable 
to assume that strict inequality holds m (1), though there are many 
exceptions m which, for example, ^‘a miss is as good as a mile” or one 
hit is as good as another 

As IS, I think, intuitively evident, when strict inequality holds in 

(1) , Criterion 3 is simply an application of the principle of admissibility^ 
That conclusion can be shown in complete generality without serious 
dij0&culty, but, m compliance with the usual mathematical limitations 
of this book, it will here be shown only under the assumption that x 
is confined to a finite number of values. 

What IS to be shown is this If 1 and 1 ' are a pair of estimates satisfy- 
ing the hypothesis of Criterion 3, and if (1) holds with strict mequalny; 
then L(l, ^) — L(l'; ^) < 0 for every z, with strict mequality for some 
%, To begin the proof calculate thus. 

(2) L(l; i) - HV, i) = E L{1; .)[P(Z(x) =1\B,)- P{V{x) = 1 1 B,)] 

I 

I 

= E L(i, ^) + E ui) mi, t), 

KHi) i>Mi) 

where the definition of Q(l, t) is clear from the context, and where it 
has been taken into account that ; ^) = 0 It will be shown that 
both sums in the last part of (2) are non-positive and that for some ^ at 
least one of them is negative Focus, for definiteness, on the second 
sum Let lo = X(^) and li, I 2 , • be, m order of increasing magnitude, 

the values of Z > \{i) for which Q(Z, i) 7^ 0 With the abbreviations 
L{k) =Df L&, 0, A(A;) =nf LW “ L{k - 1), and Q{k) =mQ{h, 0, 
the sum to be investigated is 

(3) E L{k)Q{k) = E Q{k) E L{k') 

0<k 0<k 0<k'<k 

= E Mk') E Q(k) 

0 <k' k> k' 

(This rearrangement may seem bizarre on first encounter, but it is 
widely used in mathematics generally and is in fact an exact analogue, 
for sums, of the more familiar integration by parts, for integrals ) It 
follows from (1) read with strict inequality that A(k) > 0, and it fol- 
lows from the hypothesis of Criterion 3 that QQc) < 0, and that some 
Q(k) — or an analogous term associated with the first sum in the last 
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line of (2) — IS strictly negative for some i This completes the deduc- 
tion of Criterion 3 from the strict form of (1) and the pimciple of ad- 
missibility Essentially the same argument leads fiom (1) as actually 
written to the modification mentioned in the note under Ciiteiion 3 

A very slight strengthening of (1), together with the minimax rule, 
provides a widely applicable justification of Ciiteiion 8 (consistency), 
as will now be explained Suppose that (1) not only holds but also is 
strict, if I = X(^), that is, m addition to (1) suppose only that L{V , i) 
> 0 for all V 9 ^ X(^) In this context, let x{n) be a sequence of obser- 
vations such that the minimax L*(n) of the coriespondmg estimation 
problems approaches zero with increasing n] then any sequence of mini- 
max estimates l(n) is consistent Indeed, if the sequence l(n) is not 
consistent, then, for some t, and some positive € and 5, 

r 

(4) P(lZ(n:„,n) -X(z)l > e|B,) > 3 

for some arbitral ily large values of n This implies 

(5) L*(n) > L(l(n); i) > 6 min (T(X(^) + e, ^), L(k(i) — e; ^)} >0, 
which contradicts the hypothesis 

Turn next to Criterion 5 (symmetry) Suppose that the estimation 
decision problem has symmetry in the sense defined under Criterion 5 
That does not in itself really call for estimates with the same symmetiy. 
But, if L also has the symmetry, that is, if i) = L(k(Tz'), Ti) 

for all appropriate IT, then the discussion of symmetiy in § 12 5 sug- 
gests that typically there is, at any rate, a symmetncal, admissible, 
minimax estimate. Whether L has the requisite symmetry is a ques- 
tion that can often be answered without detailed loiowledge of L 

It IS often justifiable to suppose that the function L(Z, i) is smooth 
enough to be differentiated twice with respect to Z, at least when Z is 
near X(z) This condition, though veiy often met, is not quite so de- 
void of content as it may seem to a reader brought up m the tradition 
that it makes no practical difference whether a function has a few shaip 
corners because they can always be rounded off with almost no change 
in the function. If, for example, I/(Z; i) is for all practicable purposes 
equal to | Z — X j , then L cannot be regarded as differentiable even 
once when Z = X, and the theory to be developed here for twice differen- 
tiable L(Z; z)’s in the presence of extensive observation does not apply 
It will therefore be useful to digress to the consideration of an example, 
illustratmg how corners can arise and the phenomena that tend to round 
them off. 

Suppose that a person must estimate the amount X of shelving for 
books, priced at $1.00 per foot, to be ordered for some purpose It is 
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possible that the following economic analysis of the situation would be 
sufficiently realistic The person holds every foot of shelving less than 
the number of feet, X, of books to be shelved to be worth a > 1, 
but superfluous shelving he holds to be worthless Formally, 

(6) T(Zj X) — (a — 1)(X — Z) for Z < X 

= (^ — X) for Z > X 

There is then a corner, or kink, at Z = X, so differentiation, even once, is 
impossible 

But the following analysis is much more likely to be sufficiently real- 
istic The urgency of the shelving of the books is vanable. Some would 
be worth shelving, even if the cost of shelving were very high, at the 
other extreme, there are some that would not be worth shelving unleSs 
the cost were very low More fully, the value of Z feet of shelvmg is a 
function ^(Z) that presumably has the following features. It is mono- 
tonically increasing, strictly concave, and twice differentiable in Z, 
^(0) = 0, t(oo) < CO, ^'(0) > 1 The income attached to ordermg L 
feet of shelving, at the price $1 00 per foot, is clearly 

(7) 1(1, i) = 1(1) - I 

It IS maximized at the one and only value X for which di(\)/d\ = 1, so 
that 


(8) L(Z, = [.(X) ~ X] - HD - /], 

which IS of course twice differentiable in Z 
The moral of these two possible economic analyses of one example is 
of wide applicability, as is well known among economists Where a 
superficial analysis suggests a kmk, or even a discontinuity, in an in- 
come function, deeper analysis will often shqw that the function is 
smoothed out by various economic phenomena such as the inhomo- 
geneity and the mutual substitutability of commodities 

To return from the digression, if L is twice differentiable in Z (at 
least when Z is close to X), L can be expanded in a Taylor series thus 


(9) L(l;t) = L(\,i) + 

ot 


l=Ui) 




+ o((i - X)’), 


Z=X(4) 


where, following standard usage, o((l — X)^) is a function of Z and z, not 
necessarily the same from one context to another, such that o((Z — X)^) -v- 
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(I — X)^ approaches zero as I approaches \(i) for fixed % The first term 
on the right side of (9) vanishes by the definition of estimation, the 
second must vanish also, foi otherwise L could be negative Therefore, 

+ — X)^) 

2 =\ 

= (Z - X(^))2«(^) + o((Z - X)2), 

where a{i) is defined by the context 

In view of (10), it is plausible that L may, in many problems where 
estimates of great accuracy are possible, be supposed to be practically 
of the form 

(11) LQ, = (I- X(^))2a(^), 

where a:(^) > 0 for every t This does not exactly mean that a reason- 
able L can be closely approximated by functions of the form (11) for 
all I In particular, the absurd assumption that L is unbounded (which 
such approximation would typically imply) is not to be made It means, 
rather, that under favoiable circumstances (11) may lead to a reason- 
ably good evaluation of L(l, i) In so far as the form (11) can be sup- 
posed adequately to represent L, Criterion 2 is obviously an applica- 
tion of the principle of admissibility An interesting discussion and 
application of (11) is given by Yates [Y2] 

6 A behavioralistic review, continued 

Thus far, Criteria 1, 2, 3, 5, and 8 have been discussed in behavioral- 
istic terms. In fact, under suitable hypotheses, each has been found to 
have considerable behavioralistic justification Criteria 4 and 9 also 
have such justification, but my discussion of them is so bulky it had 
better be isolated in a special section As for Criteria 6 and 7, the only 
ones remaining, they do not seem to me to have any serious justifica- 
tion at all, as will be discussed in still another section 

Criterion 4, the recommendation of maximum-likelihood estimates, is 
of extraordinary interest, for, of all the ciiteria of the verbalistic tradi- 
tion, it is essentially the only one that selects a unique estimate in al- 
most every estimation situation of practical importance The present 
section demonstrates that, in the presence of extensive observation, 
maximum-likelihood estimates are often almost minimax estimates, it 
also gives some analysis of Criterion 9, which refers to efficiency The 
way to these goals is roundabout, it begins with a study of information 
in the technical sense mentioned in § 3 6. In this section it will be as- 
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sumed for mathematical simplicity that each observation under discus- 
sion IS confined to a finite number of values, each ha\ung positive prob- 
ability for every element of whatever partition is under discussion. 

If B^ and Bj are elements of a partition, not necessarily fimte, and x 
IS an observation, say, in the spirit of (3 6 11), that the tnformaUon of 
j relative to i for the observation x is 


( 1 ) 


=Df 



Pix I B,) 

P(x I B,) 



= -5?(log- 
\ n 



The expression of / in terms of likelihood ratios is important, especially 
for the extension of the discussion to more general observations than 
those contemplated here The reader should, therefore, try to bear in 
mind that the whole discussion could be earned on m terms of likeli- 
hood ratios; I refram from so doing only for momentary reasons of no- 
tational convenience The theory of J can convemently be presented 
m a series of exercises 


Exercises 


la If y IS a contraction of x, then J{i,j,x) >J {1,3,7). With equality 
when’ Hint . 


( 2 ) 


-E 



P{x \ B,) 
P{x I B,) 



> 


-log 


P{y\B,) 

P{v\B,)' 


lb. J(^, x) >0 With equality when*^ 

2a If Xi, • • , Xyi are conditionally mdependent, then 


(3) J{i,3,xi, •, x„) = X)/(z,i,Xs) 

8 


2b. If in addition the Xg^s are conditionally identically distributed, 
then 

(4) J{i, j; xi, • • •, x„) = nJ{i, 3; Xi). 


It is interesting to evaluate the mformation /(X, X + AX; x) where X 
and X + AX are two closely neighbormg values of the parameter of an 
estimation problem, supposed, for simplicity, to be free of nuisance 
parameters If P(x\ X) is continuous m X, it is almost obvious that 
/(X, X + AX, x) approaches zero as AX approaches zero If P(x\ X) is 
differentiable m X, it is easy to show further (considering that J is non- 
negative) that even J(X, X + AX; x)/AX approaches zero as AX ap- 
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proaches zero But in this case much more can and will be shown, 
namely, 

^ J(X,X+AX;x) ^ 

(5) hm — = -H(X, x) 

A>v —*■ 0 AX 2 


= Df 

2 


d log P(x I X) 


^The function H is generally, following Fisher, called information, but 
here we had better call it differential information Chronologically, as 
explained at the end of § 3 6, the concept of differential information is 
older than that here called simply information and of which it is, ac- 
cordmg to (5), a limiting case ^ 

^ The demonstration of (5) begins with the consideration that 

(6) log (1 + 0 = i 


Therefore, 


(7) log 


P(x I X + AX) 

I X) 


log ] 1 + 


P(a; I X + AX) - P{x \ X) 


P(a: I X) I P(a: I X) J 

_ [P(a:| X + AX) - Pix\ X)] 

“ 1 P(x I X) i 

_l| f(x|X + AX)-P(.|x) |» 

2 1 P(a;|X) 1 ^ ^ 

Since the expected value given X of the term in the second line of 

(7) IS easily seen to be exactly zero, it will be tactful to leave that term 
alone; but the second may be approximated thus: 

_ rP(x|X + AX) -P(2;|X)12 fAXaP(a:lx) 


Pix X) 


+ o(AX) 

Pix I X) ax 

„[alogP(a:I X)]^ , 


+ o(AX2). 


Therefore, 

(9) J{\ X + AX, x) = |ff(X, x)AX2 + o(AX2), 

which establishes (5). 

More exercises 

3. If the A:th derivative (k > 0) with respect to X of P (a: j X) exists 
for every x, then 


P(x I X) dX' 


-P(a:l X) I X : 


i:P(a;|x)) = 0. 


(10) E 



BEHAVIORALISTIC REVIEW OF ESTIMATION 


237 


15 6] 


4 If the requisite second derivative exists, then 

(11) ff (X, x) = -e (^^ log P{x I X) I 

5 If y is a contraction of x (and H(\,x) is well defined), then H(X,y) 
< Hi\; x) 

Remark. The inequality is obvious in the light of Exercise la and the 
first part of (5). But it can also be derived from the following applica- 
tion of Theorem 1 of Appendix 2, which is useful m the next exercise. 


( 12 ) 


1 dP(y I X) 
P(y 1 X) ax 


\P(a: I X) ax 


X) 


2/, X, 


< E 


1 dP{x 1 X) 
P(x I X) ax 


^3 > 


with equality for every y and X, if and only if — log P(a: j X) can be ex- 

oX 


pressed as a function of y and X alone 
6a If y IS a contraction of x, P(X, x) = i?(X, y) for every if and 
only if y is sufficient for x 

6b jff(X, x) = 0 for every X, if and only if x is utterly irrelevant 
7a. If Xi, • • , Xn are independent given X, then 

(13) i?(X, xi, • , Xn) = x^) 


7b. If, in addition, the Xs’s are identically distributed given X, then 
(14) i?(X, xi, • , Xn) = nE{\, Xi) 

8 If 1 is a real-valued contraction of x, and P(X, x) is well defined, 


then 


(a) 


(15) 

(b) 

(16) 

E{{1 - X]2 1 

with equality if and only if 




aiogP(Z(x)|x) 

ax 


X)R(X,1) > ^ 


(17) 


d 

ax 


log P{1 1 X) = (Z - \)h 


for some constant h Hint. Use Exercise 3 and apply the Schwartz in- 
equality to (15). 
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(c) If i7(X; x) > 0, then 

(18) £([1 - X]2 I X) > {^ ^(1 1 X) j /Hi\) x). 

Exercise 8c is an impoitant, and now famous, inequality. It, together 
with its n-dimensional generalization, has been called the Cram^r-Rao 
inequality because of its independent publication by Rao and Cramer 
in 1945 and 1946 respectively (see [H6]) But the name is not at all 
well justified historically Fr^chet presented the inequality in 1943 
[F8], and Darmois extended Fr^chet^s inequality to n dimensions, at 
least for unbiased estimates, m a publication [Dl] not later than Rao^s 
The inequality has also, though I think erroneously, been attributed to 
aht early paper by Aitken and Silverstone [Al], and to one by Doob 
[DIO] My point is, of course, not to give a definitive history of the in- 
equality, but merely to suggest that for the time being an impersonal 
name would be better I tentatively propose calling it the informahon 
inequality Some recent leferences pertinent to the information in- 
equality and other topics treated thus far m this section are [W15], 
[M5], [C6], and [H6] The techniques used in the remainder of this 
section, which revolve around the mfoimation inequality, were pub- 
lished posthumously by Wald [W5] 

The information inequality has an important bearing on application of 
the mimmax rule to estimation, of which the following theorem may, 
in view of (5 11) be taken as a first illustration. 

Theorem 1 

Hyp. 1 For every X in a closed interval of length 5, iJ(X, x) < H, 
where H is a constant. 

2 1 IS a real-valued contraction of x 

CoNCL. For some X in the interval, i?((l — X)^ | X) > . 

Proof. Suppose that the theorem is false. Then according to Ex- 
ercise 8c, 

(19) + > ~E{l\X) 

\ 3/ dX 

for every X in the interval Therefore, 

(20) — [X ~ ^(1 1 X)] > 1 - = 

d\ ' ' '' \ 5/ (5ff^ -f 2) 
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for every X in the interval Therefore, at one end of the interval or 
the other, 

(21) lx-£!(l|x)|> 7? - = . 

^ + 2) 2 V 3/ 

This leads to a contradiction through the well-known inequahty 

(22) Ei[l - X]2 I X) > {E{1 - X 1 X)}2 = I X - ^(1 1 X) 1^, 

which can be derived as a direct application of Theorem 1 of Appendix 
2, or of the Schwartz inequality, or of the useful identity 

(23) E([l - X]2 I X) = V(l I X) + {^;(1 -- X I X)}^ ♦ 

In the remaining portion of this section, let it be understood that: 

1^ The Xs’s are an infinite sequence of observations that are, given X, 
identically distributed and independent 

2 x(n) = {xi, , x,i} for n = 1, 2, • • 

3 l(n) IS a real-valued contraction of x(n) 

The contraction l(n) is to be thought of as an estimate of X based on 
observation of x(n) In the spirit of the minimax theory it is really 
mixed, rather than oidmary, estimates that should be treated here 
But this entails no essential change in the following discussion once it 
IS recognized that a mixed estimate is, in effect, an ordinary estimate 
based on observation of y(n) =Df (IW, x(?i)), where x(n) is sufficient 
for y(7i), so that J3'(X, y(n)) = H(k, x{n)) for all X 

4 € and 8 are positive numbers. 

5 Ao is a closed interval of length 8 contained m the range of X and 
including a given value Xq 

The next theorem shows that, if L(Z, X) is of the form (5 11), L{l{n), 
X) cannot ordinarily be kept much smaller than a{Xo)/nH{\o, Xi) for 
large n, even in a small interval about Xq. 

Theorem 2 If Xi) is contmuous and positive at Xq, and if 
a(X) IS a non-negative function continuous at Xq, then, for sufficiently 
large n, ^J((l(n) - X)2a(X) j X) > (1 - €)a(Xo)/nH(Xo, Xi) for some 
X e Aq 

Peoof. There is no loss of generality in supposing that e < 1 and 
Ao such that, for X sAo, a(X) > a(Xo)(l — e)^ and i?(X, Xi)'^ < 
H{\o; xi)^ [1 -f (1 - e)“^]/2 Using Exercise 7b, 

(24) H(X, x(n))^ = n’%(X, Xi)^ > — H{\o, Xi)^[l -f- (1 - e) 

2t 



240 


POINT ESTIMATION 


[15 6 

for X eAo. By Theorem 1, if > 16/5^iJ(Xo, Xi)[(l — e)""^ -- 1]^, 
then ^ ^ 

(25) Eiilin) - X)2 1 X) > + (1 - 

nH(\o, xi) 

f foi some X £ Ao ♦ 

The next theorem extends Theorem 2 to practically any loss function 
that is twice differentiable in Z for Z and X close to Xq 

Theorem 3 ^ 

Myp 1 H{\, Xi) IS positive and continuous at Xq 
1 

2 a{\) = Df ; T(Z, X) IS continuous at Xq 

2 5Z*^ 

3 Inequality (5 1) holds for X in Aq 

CoNCL For sufficiently laigen,L(l(n),X) > (1 — e)a{\o)/nH{\o,Xi) 
for some X £ Aq 

Proof It may be supposed without loss of geneiality that e < 1, 
and that; for Z, X eAq, L(Z; X) > (1 — e)"^a(X)(Z — X)^ 

It may also be supposed that l(x; n) e Aq This is so, because it would 
suffice to prove the theorem for a new estimate I'C^), wheie V{x; n) is 
defined to be the number in Aq closest to l{x, n), which in turn follows 
from the fact that L(}!{n)] X) < Z/(l(n), X) for X eAq 

These suppositions having been made, the theorem is a direct con- 
sequence of Theorem 2 ♦ 

Corollary 1 If L(Z, X) satisfies (5 1) and has two derivatives with 
lespect to Z continuous in X for eveiy X and for every Z sufficiently close 
to X, and if ^^(X, Xi) is continuous and positive, then, for sufficiently 
large n, 

(26) Z/*(n) > (1 — e) sup a(X)/??iJ(X, x^), 

X 

where L*(n) is the minimax value of the estimation decision problem 
derived from L(l, X) and x(n), unless the supremum in question is in- 
finite, m which case nL*(n) approaches infinity. 

Of course, it would be enough to assume only that L(Z, X) and iJ(X, Xi) 
are well behaved at some sequence of values of X on which the supremum 
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in question is approached In particular, if the supremum is actually 
attained at some X, they need only be well behaved there 
Now, turning to the sequence of maximum-likelihood estimates, let 
them be denoted for the moment by l{n) It is known that under 
rather general hypotheses n^(i(n) - X) is asymptotically normal about 
zero with asymptotic variance Xi) f This suggests, and ex- 

amples tend to confirm, that, under some supplementary conditions, 

(27) hm nE{(l(n) - X)^) = — ^ 

Indeed, one set of conditions implying (27) is stated m [W5], but one 
that seems difficult to apply. It can be shown that (27), together with 
the usual asymptotic behavior of i(?^), imphes 


(28) hm nL(i(n) ; X) = , 

n CO ^(X, Xi) 

provided, for example, that L(l, X) is bounded for each X and that the 
second derivative of L(l, X) with respect to I exists when I = X Easily 
applied rigorous theorems implying (28) much less (27) do not seem to 
have been formulated yet, but examples suggest that, under conditions 
general enough for many applications, (28) actually does hold uni- 
formly, in the sense that, for n sufficiently large, 


(29) 


(1 - e)oi{\) 
nH(\, xi) 


< L(i(n),X) < 


(1 + e)a{\) 

nH(\, Xi) 


for all X simultaneously. If (29) holds, then, in view of Corollary 1, 
i(?^) is nearly minimax for large in the sense that 

(30) L’^(n) > (1 — e) supL(i(n), X) 

X 


Good examples can be based on (a) of Tables 3 1 and 4 1, lettmg 
L(Z; p) be any loss function having two contmuous derivatives in I 
throughout 0 < Z, p < 1 In particular, the example discussed m 
§ 13 4 arises, if L(Z, p) = (Z — p)^. It can be argued that the phenome- 
non discussed in connection with that example is probably not rare; 

t Some key refeiences for the asymptotic behavior of l(n) are DK2], [C9], [L3], 
[WIG], [N4] The literature on this subject is extraordmanly complicated There 
are acknowledged mathematical mistakes in some of its most sophisticated publica- 
tions, others prove much less than any but the most attentive reader would be led 
to suppose, few give an adequate statement of their relations to their predecessors, 
and those that make serious pretentions to ngor mvolve complicated hypotheses 
For documentation of this lament see [N4], [W4], and 003] 
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because, for minimax l(n), L(l(n); X) is, judging from examples, often 
constant and, therefore, nearly equal to sup a(X)/nJI(X, a;i), but L(i; X) 

X 


closely follows the rise and fall of Xi). 

Turn now to Criterion 9, efficiency. It seems difficult to defend the 
criterion as it has been defined in connection with (4 8) , for what vir- 
tue IS there in the asymptotic normality required by (4 8)? It is per- 
haps noteworthy that the sequence of minimax estimates, Pi(n), aris- 
ing in connection with § 13 4 does not satisfy (4 8) Indeed, (13 4 3) 
implies that n^(pi(n) — p) is asymptotically normal not about zero, 
but about (J — p) 

It is my impression that the essence of the efficiency concept resides 
not in asymptotic normality, but in the overall behavior of the mean 
square error of a sequence of estimates I therefore propose tentatively 
to modify the definition and to call a sequence of estimates l(n) effi- 
cient, if and only if its mean square error behaves at least as well as 
can typically be expected for a sequence of maximum-likelihood esti- 
mates 

Formally, I propose to call l(n) efficient, if and only if, for n suffi- 
ciently large, 

(31) £([l(n) - X]2) < -^7^ 

nH(Xj Xi) 


for every X simultaneously 

I think the mam objection that is likely to be raised to this proposed 
definition is associated with the possibility that in some problems of 
theoretical, and perhaps also of practical, importance (31) is not satis- 
fied by any sequence of estimates whatsoevei, though the maximum- 
likelihood sequence is efficient in the ^^officiak^ sense In such a prob- 
lem, are the maximum-likelihood estimates not as good for all practical 
purposes for sufficiently large n as though their variances were actually 
equal to those of the normal distributions to which they approximate*!^ 
It IS natural to think so by analogy with other contexts in the theory 
of probability, but approximate normality is actually no substitute for 
(31) in the present context The next paragraph is devoted to an ex- 
ample illustrating the inadequacy of asymptotic variance as a measure 
of asymptotic loss It can be skipped without loss by anyone not in- 
terested in such technicalities 

The best example I have been able to construct is derived from a se- 
quence of observations that is not a standard sequence Whether the 
interesting features that it exhibits can actually be realized by standard 
sequences, I do not know, but the example will do to illustrate the is- 
sue Let y(n) be any real random variable subject to the density 
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•n,^<j){(y X)n'^, n), defined thus n) is the standard normal density 

inside the interval [-5(n), 5(?i)], 5(n) being such that the standard 
normal probability of this interval is (1 — (j>(z, n) = 2~^5(2n)/4 

for 5(2?i) ^ I 2 I ^ n'^ , n) is so defined elsewhere as to be a sym- 

metric positive probability density with the first two moments finite, 
with a bounded derivative approaching zero like with increasmg z, 
and with unique absolute maximum at z = 0 It is evident that 
(yW asymptotically normal about zero with umt variance » 

The information H{\, y(n)) is well defined (even according to the strict 
conditions imposed by Cramer, Lemma 1, Section 32 2 of [C9]) The 
maximum-hkelihood estimates of X are y(n), and these are also (accord- 
ing to Theorem 3 3 of [Gl]),minimax for the simple quadratic loss 
function (I — \)^ But 

(32) " i?([y(n) - X]^ | X) = E(y(nf \ 0) 

> 2n^ J n) dy 

Si2n)n-^ 

= - 3(2w)w-l^] S(2n), 

which does not satisfy (31) Even for the bounded, and therefore more 
realistic, loss function, 

(33) L(Z,X) = mm {1, [l-Xf], 

it follows easily from Theorem 3 3 of [Gl] that every estimate must 
somewhere incur a loss at least as great as the lower bound established 
by (32) To summarize, there are no estimates efficient m the sense 
of (31), nor even in the sense that would arise from (31) on replacing 
the simple quadratic loss function by a bounded loss function, the se- 
quence of estimates j{n) is efficient m the official sense, so to speak, 
but does not, of course, lesult in losses of the order of 

What can be said in positive justification of the criterion of efficiency 
as defined by (31) or the like‘s Eoughly, the elements of such a se- 
quence nearly dominate every estimate for every smooth loss function 
A little more precisely, for large n, the loss associated with an element 
of a sequence efficient in the sense of (31) is at most laiger by a small 
fraction than that of any other estimate, except possibly in some short 
intervals f The maximum loss of such an element is at most larger by 
a small fraction than the minimax loss, so the elements of the sequence 
are typically nearly minimax. Moreover, they typically have consid- 

t It has actually been demonstrated that the total length of these exceptional 
mtervals (within any fixed interval) is small [L3] 
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erably smaller losses than any minimax estimate, except in short inter- 
vals that are typically very improbable a pi ion m the personal sense 
Thus the principle of admissibility, the minimax rule, and the peisonal- 
istic concept of probability combine to suggest that efl&ciency as de- 
fined by (31) is a promising guide in the search for good estimates 
An extensive critique of the concept of efficiency, including much 
mateiial on its history, has been given by LeCam in [L3], which unfor- 
- tunately was not available to me in its entirety as I wrote this section 
R A Fisher’s name is the most prominent in the history of maximum- 
likelihood estimation and efficiency Some historical details are given 
in [N4] and on p 45 of Vol II of [K2]. 

7 A behaviorahstic review, concluded 

Criteria 6 (unbiasedness) and 7 are now the only ones in the list for 
which I have not suggested some justification in terms of the theory of 
decision problems, and, indeed, I cannot Unbiased estimates fascinate 
many theoretical statisticians, including myself, and the study of them 
undoubtedly has certain valuable by-products Yet it is now widely 
agreed that a serious reason to prefer unbiased estimates seems never 
to have been proposed 

Three weak defenses are sometimes heard Fust, unbiasedness is as- 
serted to have an intuitive appeal, whether it does or not depends, of 
course, on the experience of the mtuitei Second, averages of incieas- 
ingly many unbiased estimates are typically consistent If this is a 
virtue, it is a limited one and pertains to the unbiased estimate not as 
an estimate, but as a step in the definition of other estimates Third, 
an allusion is made to equity. If, for example, it has been agieed that 
one party will buy a sack of sugar from another at so much per pound, 
it seems fair that the nominal weight of the sack be determined by un- 
biased estimate This ethical conclusion could perhaps be given some 
justification m terms of approximately lineai utility functions or a long- 
run argument, though there is danger of falling into such pitfalls as the 
conclusion that accuracy is unimportant for equity, and it might find 
some application in the theory of barter, but it seems, at best, tangen- 
tial to estimation in the sense of the present chapter 

For a proper appraisal of the criterion of unbiasedness it should be 
reahzed that, even if X admits an unbiased estimate, many not-at-all 
pathological functions of X (which can in turn be regarded as parame- 
ters), may fail to do so and that such unbiased estimates as X does admit 
may be preposterous These phenomena are both illustrated by the 
following simple example Let x be confined to two values, say 1 and 
2, let P(1 I X) = 1 — P(2 I X) = X; and let X be confined to the interval 
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[1/3, 2/3] Then, by definition, 1 is an unbiased estimate of 4>(X), if 
and only if Z(1)X + Z(2)(l - X) = 1(2) + (Z(l) - Z(2))X - <?5)(X)— a con- 
dition that can be met, if and only if (p is linear Suppose, for example, 
ct>(\) = X for every X, then 1(1) = 1, 1(2) = 0 defines the only unbiased 
estimate of (j>(\) This estimate is woise, according to an emphatic 
valiant of Critenon 3, than the biased estimate V such that Z'(l) = 2/3 
and Z'(2) = 1/3, for 1' (when it errs at all) errs m the same direction as 
1, but nevei nearly as fax 

As for Criterion 7, it is on fiist encounter appealing to postulate that, 
if 1 IS usually closer to X than 1' is, then 1 is better than 1' But, speakmg 
at least for myself, the initial appeal of Criterion 7 seems to have been 
bound up with the conjecture, that Criterion 7 is in some sense of the 
same sort as Criterion 3 The example given under Criterion 7 almost 
entMely evaporates the conjecture, and with it the appeal. 

In the paper [P5] in which the criterion is put forward for considera- 
tion and exploration. Pitman mentions that the criterion seems ac- 
ceptable in contexts where “the devil takes the hindmost This allu- 
sion to the devil seems to offer no justification for the criterion as a cri- 
terion of estimation, for I understand the allusion to refer only to the 
following kind of decision problem, w^hich is quite remote from estima- 
tion as ordinarily understood and is haidly ever encountered A person 
must choose between 1 and 1', winnmg a piize if the estimate of his 
choice falls closer to X than does the other one 

According to Pitman, the lelationship of “better than,’^ or “closer 
than’' as he calls it, defined by Criterion 7, is not necessarily transitive 
He argues, I think with some justice, that this breakdown of transitivity 
does not in itself invalidate the criterion when the criterion is apphed 
to select the “best” from some prescribed class of estimates, but “best” 
cannot here be taken literally 

Criterion 7 is unusual m that it depends on the j*omt conditional dis- 
tributions of pairs of estimates rather than on the distributions of each 
estimate considered separately On any ordinary interpretation of es- 
timation known to me, it can be argued (as it was under Criterion 3) 
that no criterion need depend on more than the separate distributions 
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Testing 

1 Introduction 

In principle, this chapter on the statistical process of testing (often 
referred to more fully as making tests of hypotheses or significance 
tests) might have been organized on the pattern of the preceding chap- 
ter on point estimation a statement of verbalistic ideas, followed by 
motivation and criticism in terms of behaviorahstic ideas. But I am 
dissuaded from repeating that pattern by seveial considerations It 
would, in the first place, be needlessly repetitious Thus, m the pres- 
ence of the preceding chapter I need mention only in passing that suffi- 
cient statistics and symmetry play the same role m testing as in other 
observational decision problems, and that a certain scheme of testing, 
closely related to maximum-likehhood estimation, has asymptotic, or 
large sample, virtues Again, the pattern of the preceding chapter is 
less attractive here, because the ciiteiia for tests developed m the ver- 
bahstic tradition do not on the whole seem to have such satisfying be- 
haviorahstic motivation as do their counterparts in the theory of point 
estimation Finally, it is inappropriate to attempt anything like a 
complete list of verbalistic criteria for tests heie, especially in view of 
the availability of two excellent and mutually complementary key ref- 
erences (Chapters 21, 26, and 27 of [K2], and [L4]) 

The organization actually adopted is this First, testing and criteria 
for tests are discussed from a frankly behaviorahstic viewpoint In 
this discussion ideas stemming from the verbalistic tradition are used 
freely, and some criteria of the verbalistic tradition are criticized Sec- 
ond, an attempt is made to analyze some of the important statistical 
situations to which the theory of testing is ordinarily applied It is 
becoming increasingly recognized that many of these applications are 
very crude, and that their replacement by sounder procedures consti- 
tutes some of the most important and provocative statistical problems 
of today. 

Terms introduced in boldface in this chapter are among the most 
frequent in ordinary statistical usage The definitions given are in- 

246 
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tended to be in reasonable accord with that usage, but some small con- 
cessions are made to the particular form in which the theory of testmg 
IS expressed here 

2 A theory of testing 

Verbalistically, the problem of testmg means to guess, on the basis 
of observation, which of two disjoint and mutually exhaustive hypoth- 
eses obtains Behavioralistically, this would generally be agreed to* 
point to the definition A tesUng problem is an observational decision 
problem derived from exactly two basic acts fo and fi These two basic 
acts are called (for a reason that will soon be clear) accepting and re- 
jecting the null hypothesis, respectively. 

Considered abstractly as bilinear games, testing problems may, ^o 
far *^8 I know, have no special feature beyond the uninterestmg one 
that one of two f s is appropriate to each i But, considered as obser- 
vational problems, testing problems do present some mterestmg special 
features In the first place, since at least one of the two basic acts is 
appropriate to each i, the set I of all ^'s can be partitioned mto three 
sets, Hi, and Nj defined thus 


Ufo] e) = 0 

and 

^) > 0 

for i 


L(fo, i) > 0 

and 

J^(fij ^’) = 0 

for i 


■^'(^0) i) = 0 

and 

L(fi, z) = 0 

for i 

sN. 


When it IS recalled that the ^^s correspond to a partition of S, the 
sets Hqj Hi, and N may, with a slight clash of logical gears, be regarded 
as three events partitioning S The traditional names of Hq and Hi 
are the null and the alternative hypothesis, respectively, N, being quite 
unimportant and often either ignored or made vacuous by some trick 
of definition, has no such name* Rejecting the null hypothesis when it 
does in fact obtain and accepting it when it does not obtain are called 
errors, more specifically errors of the first and second kind, respec- 
tively. 

A test IS a derived act of a testing problem A test may convemently 
be identified with the real-valued contraction z of the observation x, 
such that z{x) is the probability prescribed by the test for rejection of 
the null hypothesis in case x is observed An unmixed test (which was 
until recently the only kind contemplated) corresponds to a z confined 
to the two values 0 and 1, which respectively imply outright acceptance 
and rejection of the null hypothesis. 
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The loss associated with the test z when i obtains is clearly 

(2) L(z, = L(fo, t)E{l -z\t) + L(fi , i)E(z | z) 

= L(fi, i)E(z 1 z) for i e Ho 

= L(fo, z)[l — E(z 1 ^)] for i zHi 

= 0 for ^ e iV' 

The functions E{z | %) and [1 — E{z | %)] are, respectively, the proba- 
bility of rejecting and accepting the null hypothesis with the test z 
when i obtains There is obviously nothing to choose between them 
in importance or convenience, each bemg equivalent to the other. 
They are commonly called the power function, and operating charac- 
teristic, respectively. 

In view of (2), one test z dominates another z', if and only if 
E{z\^) <E{z'\%) forzsFo 

( 3 ) , 

E{z\%)>E{z'\%) forzeFi; 

or, again, if and only if the probability of error with z' is at least as 
great as with z for every Thus, dominance, admissibility, and equiv- 
alence depend on the basic loss function, L(f, , ^), only in so far as that 
function determines Hq and Hi This is not only remarkable but also 
useful, for Hq and Hi may well be clearly defined in contexts where 
the basic loss is vague, or otherwise ill determined. 

If z is admissible in the spirit of (3) relative to a pair of sets /Jq and 
i?i, then (if 00 IS for the moment admitted as a possible value for a loss) 
there exists a basic loss function leading to H^ and Hi and having z 
as its essentially unique minimax Indeed, let 

L(fo; i) = [1 - E{z I z)]-^ for z s H^ 

= 0 ‘ elsewhere, 

( 4 ) 

L{ti)i) = E{z\%)-^ iorizHo 

= 0 elsewhere 

With this loss and reckoning 0*oo = 0 (as is appropriate here), L(z [ %) 
= 1 or 0, according as there is or is not positive probability of makmg 
an error at % with z In view of (2) and (4), any minimax z' not equiva- 
lent to z would strictly dominate z, contrary to the assumption that z 
is admissible The moral of that conclusion can be put thus Without 
special assumptions about the basic loss, the principle of admissibility 
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and the mmimax rule lead to no criteiia expressible solely in terms of 
Fo, i?i, and the conditional distnbutions of the observation x other 
than that of admissibility itself Whether some other objectivistic prin- 
ciple could justify such criteria may be considered an open question 
but, as I have already said (in § 15 1), no other general objectivistic 
principles have been seriously maintained. 

It IS natural, for example, to demand that z have the same symmetry 
as P{x 1 1 ) and Hq and Hi, but that criterion can surely not be justified* 
at all, unless the basic loss is also assumed to have the same symmetry, 
the justifiability of which m turn depends on the case. 

To take another important example, it is often proposed that a satis- 
factory test must be unbiased, f that is, its power function must never 
be higher in Hq than m Hy More formally, the test z is unbiased, if 
and* only if ’ 

(5) E{z I zo) < E{z I zi) 

for every zo e Hq and every zi s Hy 
Assuming that L(fo, z) and L{fy, z) are constant m Hy and Hq, re- 
spectively, It will be shown that any imnimax must be unbiased As a 
step toward that demonstration, consider a testing problem as a mini- 
max problem, without any special assumption about the basic loss 
It IS possible that L* = 0, in which case the mmimax tests are all equiv- 
alent and all unbiased Putting that possibility aside, I assert, and will 
show, that (under the usual mathematical simplifications) 

(6) max L(z, z) = max L(z, z) = L* 

t z Hq is Hi 

for any mimmax z. It is obvious that neither maxunum exceeds L*, 
and also that one or the other must equal L*. But suppose, for exam- 
ple, that the second maximum were actually less than L*, and consider 
z' = az with 0 < a < 1 According to (2), if z' is substituted for z, 
the first maximum in (6) will be depressed, and, for a suflSciently close 
to 1, the second would remain actually less than L*, which contradicts 
the assumption that z is mimmax, establishing (6) 

Now make the special assumption that 

L(fo, %)= A for 

(7) 

Z/(fi, t) = B for i bHqj 

and suppose that z could be mmimax but biased There would then 

t A defimtion unifying the various concepts of unbiasedness in statistics is put 
forward in [L5] 
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exist io ^Hq and b Hi such that 

(8) L* = Liz, to) = BEiz \to) =A- AEiz 1 ij) = L(z, zi), 

and such that E(Zj to) > E{z, ti) But consideration of the test that 
simply assigns to every x the numbei midway between E{z, ^o) and 
E(z, zi) shows that z could not be minimax 

r The condition (7) is a reasonable assumption in some testing problems, 
and, where (7) is satisfied, the criterion of unbiasedness has such sup- 
port as the minimax rule can give In many other typical testing prob- 
lems, however, there are borderline errors that hardly matter at all but 
can scarcely be prevented, and serious e/iors that can largely be pre- 
vented The following example, which can be varied to suit diverse 
tastes, shows that it can be folly to insist on unbiasedness m §uch 
problems 

Let % take the three values 0, 1, 2, and let x take the values 0 and 1 
with conditional probabilities defined thus 

(9) P(0 1 0) = 99/100, P(0 1 1) = 0, P(0 1 2) = 1. 

Let the basic loss be defined by the condition that 2 e Ho or 2 e ITi, ac- 
cording as t = 0 or not, and by 

(10) L(fi, 0) = 1, L(fo, 1) == 1, L(fo, 2) = 1/101 
Then 

L(z, 0) = [99zi0) + z(l)]/l00 

(11) L(z, 1) = 1 - ^(1) 

L(z, 2) = [1 - 0(O)]/1O1 

It IS easily verified that the only minimax z* is defined by ^*(0) = 0, 
2 *( 1 ) = 100/101, and that I/(z*, ^) = L* = 1/101 for every ^ But it 
IS also easily verified that the only unbiased tests are absurd in that 
they Ignore the observation x, they are in fact just those for which 
Z(0) = 2(1) 

It has until quite recently been said by many that attention should 
be confined to tests such that there is a fixed probability a (called the 
size of the test) of making an error of the first kind for every ^ e Hq. 
Indeed, the criterion of size has often been taken so seriously as to be 
incorporated into the very defimtion of a test Though many impor- 
tant tests happen to have a size, others equally impoitant do not, so 
it now seems to be recognized [L4] that the possession of a size cannot 
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be taken seriously as a criterion f To take an everyday example, con- 
sider the binomial distributions 

(12) P{x 1 V) = - p)ioi-. 

where the parameter p confined to [0, 1] plays the role of % and a: = 0, 
, 101, and suppose that Ho is the hypothesis that p < 1/2 A test 
of size a IS a test for which 

(13) Yj z{x) ( p®(l - p)i°i-® = a 

X \ X / 

■% 

for all p < 1/2 This obviously implies 

for all p < 1/2, whence z(x) = a for every x So only absurd tests 
have size, in this example, though there are clearly tests here that are 
quite satisfactory for many applications, for example, let z{x) equal 0 
or 1 according as rc < 50 or a; > 50 
In view of the criticism just made, there is a tendency to redefine 
size so that any test has a size a, namely, 

(15) Qj = Df niax E{z j i) 

% s Ho 

In terms of this definition of size, a concept of testing somewhat differ- 
ent from that proposed in this section has been defined and defended 
(Wald, p. 21 of [W3], and Lehmann, pp. 17-18 of [L4]; namely, it is 
postulated that a test is to be chosen not from among all possible tests, 
but only from among those having a size a (m the sense of (15)) given 
as part of the testing problem J This concept of testing is not defended 
to the exclusion of the one proposed here, but it is asserted by the 
authors cited to be more realistic for some problems The arguments of 
both authors on this pomt are similar and, I think, quite weak in two 
crucial places, for the advantage is supposed to flow in some unspeci- 
fied way from the undemonstmted impossibility of comparmg prefer- 
ences for consequences of qualitatively different kinds It seems, if I 
may be allowed such a conjecture, that the concept of testmg under a 

t Statisticians interested in the Behrens-Fisher problem may be mterested in pp 
36.173a~b of [F6], which hinge on the question of size as a criterion 

t The constraint actually imposed, especially by Lehmann [L4], is that the size 
be at most a But, as Lehmann explains, this difference is more apparent than real. 



252 


TESTING 


[16 3 

constraint of size represents a Procrustean attempt to fit the (older) 
Neyman-Pearson theory of testing hypotheses too closely with the 
(newer) minimax theory It is not to be denied, of course, that there 
may sometimes be a mathematical advantage in studying and compar- 
ing tests of given size. 

It should be mentioned, before concluding the subject, that any the- 
ory taking size seriously introduces an asymmetry of the theory with 
^respect to iJo and ili, an asymmetry that is surely not always appropri- 
ate 

Significance level, or level of significance, is a synonym (neglecting 
a slight distinction made in [L4]) of size, probably more widely used 
than size itself. , 

3*^ Testing in practice 

The theory of testing admits some fairly realistic applications, but 
the present state of statistics is such that the theory of testing is in- 
voked more often than not in problems on which it does not bear 
squarely This section discusses typical applications of the theory, 
pointing out the shortcomings I am aware of 

The development of the theory of testing has been much influenced 
by the special problem of simple dichotomy, that is, testing problems 
m which Hq and Hi have exactly one element each. Simple dichotomy 
is susceptible of neat and full analysis (as in Exercise 7 5 2 and in 
§ 14 4), likelihood-ratio tests here being the only admissible tests, and 
simple dichotomy often gives insight into more complicated problems, 
though the point is not explicitly illustrated in this book. 

Coin and ball examples of simple dichotomy are easy to construct, 
but instances seem rare in real life The astronomical observations 
made to distinguish between the Newtonian and Einsteinian hypotheses 
are a good, but not perfect, example, and I suppose that research in Men- 
delian genetics sometimes leads to others There is, however, a tradi- 
tion of applying the concept of simple dichotomy to some situations to 
which it IS, to say the best, only crudely adapted Consider, for ex- 
ample, the decision problem of a person who must buy, fo, or refuse to 
buy, fi, a lot of manufactured articles on the basis of an observation x 
Suppose that % is the difference between the value of the lot to the per- 
son and the price at which the lot is offered for sale, and that P{x\i) is 
known to the person Clearly, Hq; Hi, and N are sets characterized 
respectively by f > 0, ^ < 0, ^ = 0. This analysis of this, and similar, 
problems has recently been explored in terms of the minimax rule, for 
example by Sprowls [S16] and a little more fully by Rudy [R4], and by 
Allen [A3] It seems to me natural and promising for many fields of 
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application, but it is not a traditional analysis On the contrary, much 
literature recommends, in effect, that the person pretend that only two 
values of i, %o > 0 and < 0, are possible and that the person then 
choose a test for the resulting simple dichotomy The selection of the 
two values to and ti is left to the person, though they are sometimes 
supposed to correspond to the person’s judgment of what constitutes 
good quality and poor quality— terms really quite without definition 
The emphasis on simple dichotomy is tempered in some acceptancor 
sampling literature, where it is recommended that the person choose 
among available tests by some largely unspecified oveiall consideration 
of operating characteristics and costs, and that he facilitate his sujfvey 
of the available tests by focqsing on a pair of points that happen to in- 
terest him and considering the test whose operatmg characteristic 
psssses (economically, in the case of sequential testing) through the 
pair of points These traditional analyses are certainly inferior in the 
theoretical framework of the present discussion, and I thinlv they will 
be found inferior m practice 

To make a small digression, there is a complication in connection vuth 
testing whether to buy that is not ordinarily envisaged by statistical 
theory , namely, the economic reaction between the buyer and the sup- 
plier If, for example, the suppliei knows the test the buyer is gomg 
to apply, that knowledge will influence the quality of the lot supplied 
There seems to be little, if any, successful work on the economic prob- 
lem thus raised about the game-like behavior of the two people involved 
(cf pp 331, 340, and 346 of [W6]) 

The problem whether to buy a lot obviously has many formal coun- 
terparts in other domains. In some of them it is particularly clear that 
purely objectivistic methods do not suffice To illustrate, imagine two 
experiments’ one designed to determine whether it is advantageous to 
add a certain small amount of sodium fluoride to the drinking water of 
children, the other to determine whether the same amount of oil of 
peppermint is advantageous Granting that each of the two additions 
can be made at the same cash cost for labor and material and that the 
designs of the two hypothetical experiments differ only in the inter- 
change of the roles of sodium fluoride and oil of peppermint, the corre- 
sponding testing problems are objectivistically completely parallel, that 
IS, the same with regard to loss function and conditional probability of 
the observations But it must be acknowledged, I think, that the people 
actually charged with the decision in either of these two cases would 
and should take into account opmions they had before the observation 
For example, they might originally have considered it nearly impossible 
that the oil of peppermint could result in any hygienic advantage large 
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enough to compensate for even the small cost of its administration, but, 
in view of recent dental researches on the subject, they might not have 
consideied it at all unlikely that the sodium fluoride should have an 
overall advantage In that case, parallel observations in the two ex- 
periments would not always lead to parallel decisions Objectivists 
typically admit such a possibility but go on to say that it is unreasonable 
to isolate the experiment and that it is the totality of information bear- 
ing on the subject that should be treated objectivistically If objectiv- 
ists could give a more detailed discussion of how to deal with such a 
totality of information, it might do much to clarify their position 

I turn now to a different and, at least for me, delicate topic in connec- 
tion with applications of the theory of testing Much attention is given 
in the literature of statistics to what purport to be tests of hypotheses, 
in which the null hypothesis is such that it would not really be accepted 
by anyone The following three propositions, though playful in con- 
tent, are typical in form of these extreme null hypotheses, as I shall call 
them for the moment 

A The mean noise output of the cereal Krakl is a linear function of 
the atmospheric pressure, in the range from 900 to 1,100 millibars. 

B The basal metabolic consumption of sperm whales is normally 
distributed [Wll] 

C New York taxi drivers of Irish, Jewish, and Scandinavian extrac- 
tion are equally proficient in abusive language 

Literally to test such hypotheses as these is preposterous If, for ex- 
ample, the loss associated with fi is zero, except in case Hypothesis A 
is exactly satisfied, what possible experience with Krakl could dissuade 
you from adopting 

The unacceptability of extreme null hypotheses is perfectly well 
known, it is closely related to the often heard maxim that science dis- 
proves, but never proves, hypotheses The role of extreme hypotheses 
in science and other statistical activities seems to be important but ob- 
scure In particular, though I, like everyone who practices statistics, 
have often “tested’^ extreme hypotheses, I cannot give a very satisfac- 
tory analysis of the process, nor say clearly how it is related to testing 
as defined in this chapter and other theoretical discussions None the 
less, it seems worth while to explore the subject tentatively; I will do 
so largely in terms of two examples 

Consider first the problem of a cereal dynamicist who must estimate 
the noise output of Krakl at each of ten atmospheric pressures between 
900 and 1,100 millibars It may well be that he can properly regard the 
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problem as that of estimating the ten parameters in question, in which 
case there is no question of testmg Bufc suppose, for example, that 
one or both of the following considerations apply First, the engineer 
and his colleagues may attach considerable personal probabihty to the 
possibility that A is very nearly satisfied— very nearly, that is, in terms 
of the dispersion of his measurements Second, the administrative, 
computational, and other incidental costs of using ten individual esti- 
mates might be considerably greater than that of usmg a Imear formula 
It might be impractical to deal with either of these considerations very 
rigorously One rough attack is for the engineer first to examine the 
observed data x and then to proceed either as though he actually be- 
lieved Hypothesis A or else in some other way The other way might be 
to make the estimate according to the objectivistic formulae that would 
have been used had there been no complicating considerations, or it 
might take into account different but related complicating considera- 
tions not explicitly mentioned here, such as the advantage of usmg a 
quadratic approximation. It is artificial and madequate to regard this 
decision between one class of basic acts or another as a test, but that 
IS what in current practice we seem to do The choice of which test 
to adopt in such a context is at least partly motivated by the vague 
idea that the test should readily accept, that is, result in acting as though 
the extreme null hypotheses were true, m the farfetched case that the 
null hypothesis is indeed true, and that the worse the approximation of 
the null hypotheses to the truth the less probable should be the ac- 
ceptance 

The method just outlined is crude, to say the best It is often modi- 
fied in accordance with common sense, especially so far as the second 
consideration is concerned Thus, if the measurements are sufficiently 
precise, no ordinary test might accept the null hypotheses, for the ex- 
periment will lead to a clear and sure idea of just what the departures 
from the null hypotheses actually are But, if the engineer considers 
those departures unimportant for the context at hand, he will justifiably 
decide to neglect them 

Rejection of an extreme null hypothesis, in the sense of the foregomg 
discussion, typically gives rise to a complicated subsidiary decision 
problem Some aspects of this situation have recently been explored, 
for example by Paulson [P3], [P4]; Duncan [Dll], [D12]; Tukey [T4], 
[T5], Scheffe [S7], and W D Fisher [F7] 

To summarize abstractly, I would say that, in current practice, so- 
called tests of extreme hypotheses are resorted to when at least a little 
credence is attached to the possibility that the null hypothesis is very 
nearly true and when there is some special advantage to behaving as 
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though it were true. One other illustration will make it clear that point 
estimation is not essential to the situation and that belief in the approxi- 
mate truth of the null hypothesis alone does not always justify testing 

Consider the personnel manager of a great New York taxi company 
Wishing, of course, that his drivers should be as pioficient as possible, 
he would, under simple ciicumstances, hire exclusively fiom the na- 
tional-extraction group that had obtained the highest mean scores in a 
standard proficiency examination, for why should he not be guided by 
a positive indication, however slight? A statistical test of the extreme 
Hypothesis C would not, therefore, be called for, as has been pointed 
out m general terms by Bahadur and Robbins [B3] Even strong be- 
lief that ethnic differences are extremely snoall m the respect m question 
would not alone be any reason for departing from this simple policy, 
dic'lated by the principle of admissibility — quite in contrast to the ex- 
ample framed around Hypothesis A. If, however, public opinion, a 
shortage of labor, or administrative difficulty militates against any dis- 
crimination at all, the manager may lesort to a test based on the ex- 
ammation scores 

In practice, tests of extreme hypotheses are typically chosen from a 
lelatively small arsenal of standard types, or families, each family con- 
sisting of one unmixed test at every significance level (as size is always 
called in this context) In publications, it is standard practice not 
simply to repoit the result of a test, but rather to report that level of 
significance foi which the corresponding test of the relevant family 
would be on the borderline between acceptance and rejection The 
rationale usually given for this procedure is that it enables each user 
of the publication to make his own test at the significance level he deems 
appropriate to his particular problem Thus the significance level is 
supposed to play much the same practical role as a sufficient statistic 

An interesting contribution to the theory of extreme hypotheses is 
given by Bahadur [Bl] in the special context of the two-sided i-test 
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Interval Estimation 
and Related Topics 

1 Estimates of the accuracy of estimates 

^The doctrme is often expressed that a point estimate is of little; or 
no, value unless accompanied by an estimate of its own accuracy. This 
doctrine, which for the moment I will call the doctrine of accuracy esti’- 
rnation, may be a little old-fashioned, but I think some critical discus- 
sion of it here is m order for two reasons In the first place, the doctrme 
IS still widely considered to contain more than a gram of truth For 
example, many readers will think it strange, and even remiss, that I 
have written a long chapter (Chapter 15) on estimation without even 
suggesting that an estimate should be accompanied by an estimate of 
its accuracy In the second place, it seems to me that the concept of 
interval estimation, which is the subject of the next section, has largely 
evolved from the doctrine of accuracy estimation and that discussion 
of the doctrine will, for some, pave the way for discussion of interval 
estimation 

The doctrine of accuracy estimation is vague, even by the standards 
of the verbalistic tradition, for it does not say what should be taken 
as a measure of accuracy, that is, what an estimate of accuracy should 
estimate Any measure would be rather arbitrary, a typical one, here 
adopted for defimteness, is the root-mean-square error, 

(1) E^i[l - X(t)]2 1 £.) = { T(1 1 -BO + [^(1 1 50 - « 

using (15 6 23) The root-mean-square error reduces to the standard 
deviation, y^(l | B^), in case the estimate 1 is unbiased 

Taking the doctrine literally, it evidently leads to endless regression, 
for an estimate of the accuracy of an estimate should presumably be 
accompanied by an estimate of its own accuracy, and so on forever. 

Even supposing that the doctrine were somehow purged of vagueness 
and endless regression, it would still be m clear conflict with the be- 
havioralistic concept of estimation studied m Chapter 15 If a decision 
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problem consists in deciding on a number in the light of an observation, 
the person concerned wants to adopt an 1 that is, in some sense or 
other, as good as possible, but, since he must make some decision, it 
could at most satisfy idle curiosity to know how good the best is — 
idle, I say, because, his decision once made, there is no way to use knowl- 
edge of its accuracy 

Since it seems to me that the kind of problem envisaged in Chapter 
k5 IS of frequent occurrence and may properly be called estimation, 
I am inclined to say that the doctrme of accuracy estimation is errone- 
ous. However, it is possible that someone should point out a different 
class of problems, also properly called problems of estimation, with re- 
spect to which the doctrine has some validity, though, so far as I know, 
this has not yet occurred 

5ne sort of situation that might, through what I would consider 
faulty analysis, seem to support the doctrme of accuracy estimation is 
illustrated by the following, highly schematized example A person 
has to estimate the number n of replacement parts of a certain sort 
that should be carried by an expedition He can conduct a tiial the 
outcome of which will, let us say, be an observation x distributed in 
the Poisson distribution with mean equal to am] that is, 

(2) Pix I n) = 

where ce is a known constant and c, which the person can choose, is the 
cost (beyond overhead) of the trial Under reasonable hypotheses, 
once c has been chosen and the value x observed, n{x) = x/ac is a good 
estimate of u] and in so far as the problem is of the type envisaged in 
Chapter 15, that is the end of the matter 

But there may be features of the problem that have not yet been 
stated, though m principle they should have been In particular, it 
may be that the person is free to conduct a second trial, though there 
will typically be a high penalty for doing so One rough, but sometimes 
natural and practical, step toward deciding whether a second trial is 
called for is to remark that (n/ac)^ is a good estimate of the root-mean- 
square error of n and may give a fairly good basis on which to judge 
whether the risk of misestimation warrants the expense of a second 
trial 

My own conviction is that we should frankly regard such a problem 
as has ]ust been described as a special problem in sequential analysis 
and treat it as an organic whole Viewed thus, c is to be chosen in the 
light of the possibility of making a second trial The decision to be 
based on x is the complex one of whether to go to the expense of a second 
trial; if so, of what magnitude; and, if not, what estimate of n to adopt. 
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Another sort of situation that seems to have stimulated the doctrine 
of accuracy estimation is the following Suppose that a research worker 
has observed Xi, • • • , Xn, which are independent and normally distributed 
about the mean ix with variance given jn and cr If he wishes to pub- 
lish the results of his investigation for all concerned to use as their own 
needs and opimons may dictate, he should, ideally, pubhsh a sufficient 
statistic of his observation, stating how it is distributed given y. and <t. 
Amy other course may deprive some reader of some mformation he 
might be able to put to use So far as the primary aim is concerned, all 
sufficient statistics are equivalent, but secondary considerations greatly 
narrow the research worker's choice To illustrate, consider the^ five 
sufficient statistics the values of which for {a;i, • • , Xn} are 

^(a) [xi, •••, Xn]- 

^(b) The n order statistics of {a;i, • * j Xn]- 

(c) 23 

(d) X ^%/n and = Bf (23 ^ 23 ^%)/'^ “■ 1 

(e) X and 

If n IS at all large, (c), (d), and (e) are cheaper to publish than (a) 
and (b) Moreover, for almost any use to which a reader might wish 
to put the data, (c), (d), and (e) will save him a considerable amount 
of computation In so far as it is true that almost any reader who has 
a use for the data at all will use x, but not necessarily 23 statistics 
like (d) and (e) are slightly preferable to (c) There is something to be 
said both for (d) and for (e), in view of the ready availability of certam 
tables; but, at least when n is very large, there is a shght advantage to 
(e) for those calculations a reader is most likely to perform In par- 
ticular, a reader usmg (e) can, when n is large, often ignore the actual 
value of n Even if the distributions of the Xi, • • , Xn are not exactly 
normal, (c), (d), and (e) often can play almost the same role as suffi- 
cient statistics It IS no wonder then that (e) is often chosen as a con- 
vement way to present data But, m my opmion, it is a mistake to 
lay great theoretical emphasis on the fact that (e) happens to consist 
of what IS ordinarily a good estimate of y, namely x^ together with what 
IS ordinarily a good estimate of the root-mean-square error of that es- 
timate, namely s/n'^- 

2 Interval estimation and confidence intervals 

The verbalistic tradition has suggested a procedure different from 
point estimation but somehow related to it This other procedure, here 
called interval estimation, can be defiLned as follows, though the defini- 
tion IS necessarily vague Where x is an observation subject to the 
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conditional distributions P{x | Bi) and X(^) is a function of guess 
that X(^) lies in some set M{x) (to be called an interval estimate) de- 
termined for each value of x It is almost a part of the definition to 
say that the function M(x) is to be so chosen that P(X(^) zM{x) j 
shall be nearly 1 for every ^ and that M{x) should tend to be small and 
“close kmt” in a geometiical sense, some compromise being effected be- 
tween these two conflicting desiderata The paiametei X(^) could in 
principle be a very general function, but it will here be enough to sup- 
pose for definiteness and simplicity that X(^) is real Though more 
general possibilities are contemplated m principle, the set M(x) is in 
practice typically a bounded interval, which corresponds with what I 
meant in saying that M(x) is supposed to^be “close knit ” 

The idea of interval estimation is complicated; an example is in order 
Suppose that, foi each X, x is a real random variable normally distrib- 
uted about X with unit variance, then, as is veiy easy to see with the 
aid of a table of the normal distribution, if M (x) is taken to be the in- 
terval [rr ~ 1 9600, x + 1 9600], then 

(1) P(X eM(x) jx) = a, 

where a is constant and almost equal to 0 95 

It IS usually thought necessary to warn the novice that such an equa- 
tion as (1) does not concern the probability that a random variable X 
lies in a fixed set M(x) Of course, X is given and therefore not landom 
in the context at hand, and, given X, a is the probability that Af(x), 
which IS a contraction of x, has as its value an inteival that contains X. 

Why seek an interval estimate? One sort of verbalistic answer runs 
like this At first glance, the problem of estimation seems to requiie 
that a person guess, on observing that x takes the value x, that X(z) 
has some particular value l(x), but, since it is virtually impossible that 
such a guess should be correct, it seems better to try something else 
In particular, it is often possible to assert that X(^) is in a comparatively 
narrow interval M(x), chosen according to such a system that it is very 
improbable for each i that the assertion will be false Less extreme ver- 
balistic explanations tend to give the impression that point estimation 
need not be altogether rejected, but that interval estimation satisfies 
a parallel need. 

The first part of the explanation just cited is specious, since no one 
really expects a point estimate to be correct, and since, when one really 
is obliged by circumstances to make a pomt estimate in the behavioral- 
istic sense, there is no escaping it. None the less, that part of the ex- 
planation does seem to give some insight into the appeal of interval es- 
timation The second part of the explanation is a sort of fiction, for it 
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wi be found that whenever its advocates talk of making assertions that 
have high probability, whether in connection rvith testing or estima- 
tion, they do not actually make such assertions themselves, but end- 
lessly pass the buck, saying in effect, “This assertion has arisen accord- 
ing to a system that vnll seldom lead you to make false assertions if 

you adopt it As for myself, I assert nothing but the properties of the 
system 

From the behavioralistic point of view, I mamtain that point estima-- 
tion fulfils an impoitant function On the othei hand, I can cite no 
important behavioralistic interpretation of interval estimation More- 
ovei, m such direct and indirect contact as I have had with actual ^sta- 
tistical piactice, I have — with but one extraordinary exception, which 
will soon be discussed— encountered no applications of interv^al estima- 
tion that seemed convincing to me as anything more than an informal 
device for exploiing data or crudely summarizmg it for others In 
short, not being convinced myself, I am in no position to present con- 
vincing evidence for the usefulness of mterval estimation as a direct 
step in decision The reader should know, however, that few are as 
pessimistic as I am about interval estimation and that most leaders in 
statistical theory have a long-standing enthusiasm for the idea, which 
may have more solid grounds than I now know 


The following is a schematized example of one sort of decision prob- 
lem that does call for something like interval estimation An observa- 
tion X bears on the position X of a lifeboat, the occupants of which wiU 
be saved or lost, according as the boat is or is not sighted by a search- 
ing aircraft before nightfall The decision problem is, therefore, to 
choose, from all the domains that the airplane could search m time, one 
domain M{x ) , and the loss must, in effect, be reckoned as 0 or 1 accord- 
ing as M{x) does or does not contain X This type of problem seems, 
however, too rare and too special to be taken as representative of those 
for which interval estimation is so widely advocated 
Many criteria have been put forward for interval estimation, but I 
am of course in no position to discuss them critically. J. Neyman has 
gone about the search for criteria systematically, settmg up a parallel- 
ism between the theory of mterval estimation and of testmg. In par- 
ticular, paralleling the criterion of fixed size for tests, he has emphasized 
mterval estimates such that 


(2) P{\{i) zM{x)\B^) - c. 

for a fixed a (typically close to 1) and for every i. Such interval esti- 
mates are called confidence intervals at the confidence level a. The 
interval estimate mentioned in connection with (1) is obviously a con- 



262 INTERVAL ESTIMATION AND RELATED TOPICS [17 4 

jSdence interval Wald [W3] sought to include the theory of confidence 
intervals in the mimmax theory, but m my opinion he did not succeed 
m giving interval estimation a behaviorahstic interpretation 

Though I am m no position to criticize any ciiterion of interval es- 
timation, I venture to ask whethei (2) is not gratuitous, as I have more 
positively asserted of its analogue in the theory of testing 

Chapters 19 and 20 of [K2] will serve as key refeiences for interval 
estimation. 

3 Tolerance intervals 

There has recently been considerable study of what are called toler- 
ance intervals (or limits) They are relat€;d to the problem of guessing 
the actual value of a real random variable y, on the basis of an obser- 
vation of X A tolerance interval for y at tolerance level a and confi- 
dence level /3 IS an interval-valued function Y{x) such that 

(1) P[F(2/ s F(x) \B,,x)>a\ p 

for every i 

The concept expressed by (1) is a slippery one, perhaps it will help 
to express it in words thus For every there is probability /3 that x is 
such that y will fall in Y (x) with probability at least a, given and 
X In typical applications y is independent of a;, this permits a slight 
simplification of the definition The notion of tolerance interval seems 
to me at least as unamenable to behaviorahstic mteipietation as that 
of confidence interval, and I therefore venture no discussion of it here 
Key references are [B22] and [W7] 

4 Fiducial probability 

This IS not really a section on fiducial probability, but rather an 
apology for not having such a section The concept of fiducial proba- 
bility put forward and stressed by R A Fisher is the most disputed 
technical concept of modern statistics, and, since the concept is largely 
concerned with interval estimation, I wanted to discuss it here I 
have, however, been privileged to see certain as yet unpublished manu- 
scripts of R M Williams [W12] and J W Tukey which convince me 
that such discussion by me now would be premature 

Some key references to fiducial probability and to the Behrens-Fisher 
problem, which is the most disputed field of application of fiducial 
probability, are Fisher's own papers, especially [F5], and Papers 22, 
25, 26, 27, and 35 of the collection [F6]; Kendall [K2], Chapter 20, 
Yates [Yl], Owen [01], Segal [S9], Bartlett [B6], Scheff6 [S6], [S5], 
Walsh [W9], and Chand [C5]. 
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Expected Value 

This appendix, a brief account of some relatively elementary aspects 
of the badly named mathematical concept, expected value, is presented 
for those who might otherwise be handicapped in reading this boek 
No proofs are given heie, but the reader who needs this appendix will 
probably be willing and able to accept the facts cited without proof, 
especially if he acquires intuition for the subject by workmg the sug- 
gested exercises. The requisite proofs are, however, given implicitly 
in any standard work on integration or measure (e g , Chapters I-V of 
[H2]). 

Throughout this appendix, let S be a set with elements s and subsets 
A, B, Cj • on which a (fimtely additive) probability measure P is 
defined Bounded real random variables, that is, bounded real-valued 
functions, defined for each s eS, will here be denoted by x, y, • •, and 
real numbers by x, z, and lower-case Greek letters 
The expected value of x, generally written ^(x), is characterized as 
the one and only function attaching a real number to every bounded 
random variable x, subject to the following three conditions for every 
X, y, p, O', and B 

(1) E(px + o-y) = pE(x) + (rE(j) 

(2) E(x) >0 whenever P(x{s) < 0) = 0 

(3) E(ci\ B)) = P(B) 

In (3), c( I B) IS the characteristic function of B, that is, c(s \ B) = I, 
if s eB, and c(s j 5) = 0, if s s ^B In mathematical contexts remote 
from the topics in this book, the term “characteristic function^^ has at 
least two other meanings virtually unconnected with the one at hand, 
one in connection with linear operators on function spaces, and another 
in connection with the Fourier analysis of distributions 

Often the expected value of x is referred to as the integral of x over 
Sj in which case it is generally written /a;(s) dP(s) 
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Exercises 

1 If X takes only a finite number of values^ Xi, * , Xnj except on a 

set of probability zero, then 

n 

(4) E{x) = XiPix(s) = ij, 

l=l 

that is, the average of the XtSj each weighted by the probability of its 
occurrence 

2 If P(x(s) < y(s)) = 0, E(x) > E(y); and if, in addition, P(x(s) > 
y(s'y+ e) > 0 for some e > 0, then E(x) > J5(y) f 

3 If X IS a real random variable, a partition, and ctj real numbers 
such that < x{s) < for s s 5^, then 

(5) ^P^P(B^) < E(x) < Xcr,P(B0 

4 c(l A n B) = c(l A)c(l B), 

c(l ~A) = 1 - c(A), 

c(l A U B) = c(l A) + c(l B) - c(| A)c(l B) 

As is explained in texts on measure theory, the expected value can 
(at least for countably additive measures), and in piactice must, be ex- 
tended to many unbounded random variables 
Since, provided P(B) > 0, the conditional probability, defined by 
P(C\B) = P(C n B)/P{B)^ IS itself a probability measure, the ex- 
pectation of X with respect to a conditional probability is a meaningful 
concept This conditional expectation is written P(x | B) and read 
'The expected value of x given B ” 

More exercises 

5 E(x \ B) - P(xc(l B))/P(B) Hint It suffices to verify that the 
expression on the right satisfies the thiee conditions parallel to (1-3) 
that define E(x\ B) 

6 If Pi IS a partition of S, then 

(6) 23 \ E^) = 1 for every 5. 

I 

7 B(x) = 2 I B0P(B,) Hint. Use x = lx 

I 

t Technical note In the event that P is countably additive, P{x{s) > t/(s)) > 0 
implies the existence of a suitable e, so then e need not be mentioned at all 
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Suppose y IS a (not necessarily real) random variable that takes on 
)nly a finite number of values It will be understood that E(x j y) is 
he expected value of x given that y(s) = y, provided y is such that 
his event has positive probability Furthermore, it will be imderstood 
hat E(x\y) is a bounded real random variable that for each s takes 
he value E(x | y{s)) The definition leaves jE^(x j y) undefined on the 
mil set of those points s where y(s) is a value that y takes on with prob- 
ibility zero It is immaterial how this blemish is removed; m particu- 
ar S(x 1 y) may as well be set equal to 0, where it has not already been 
lefined 

JtiU more exercises 

8 EiEih 1 y)) = Eih) 

9 .If f is a real-valued function defined on the values of y, then /(y) 
s a bounded real variable, and 

:7) Eif(j)x) ^ E(f(y)E(x\y)) 

10 If h(x) IS such that, for all f, 

(8) E(Ky)x) = Emhij)), 

then h{y{s)) = E(x\y{$))j except possibly on a set of s^s of probability 
zero 

Exercise 9 and its corollary, 8, present the most frequently used prop- 
erties of conditional expectation Exercise 10 shows that the property 
presented m 9 characterizes conditional expectation Through this 
eharacterization Kolmogoroff [K7] extends the ideas of conditional ex- 
pectation and also of conditional probability (for countably additive 
measures) to random variables y not necessarily confined to a finite or 
even denumerable set of values; though the definition m terms of ordi- 
nary conditional probability then breaks down completely, the proba- 
bility that yis) = y often being 0 for every y. 
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Convex Functions 


^This appendix gives a brief account of convex functions in the same 
spirit as the preceding one gives an account of expected value Eeason- 
able facsimiles of the proofs omitted here are scattered through [H4], 
where they may be found by anyone not content to skip them. 

An interval is a set I of real numbers, such that, li z e I and x <y 
< 2 , then y z I, It is not difficult to see that intervals can be classified 
according to Table 1, where it is to be understood that x < z 

Table 1 The various types of intervals 

The set of 
Symbohc real 2/^s 

designation such that Veibal desciiption 


(-00, +oo) 

y = y 

The infinite interval (the set of 



all real numbers) 

(x, +«) 

(-00, x) 

V A 

Open 






fhalf-mfinite intervals 

[x, +oo) 

(-00, x\ 

x< 2/1 

N. f 

Closed 



y\ 




{x,z) 

X < y < z 

Open 

1 


[x, z) 

(x, z] 

x< y < z) 

X < y < z\ 

Half-open 

bounded inteivals 

[x, z] 

X K y K z 

Closed 

j 


[x, x] 

X = y 

One-point intervals 


y <y 

The vacuous interval (the vacu- 


ous set) 


A real-valued function t defined for z in an interval I is convex, if 
and only if the graph of the function never rises above any chord of it- 
self. Analytically, if p and <r are positive, p -h (t = 1, and x, y el, then 

t(px + cry) < pt(x) + crt(y) 
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If equality holds in (1) for some p; then, as is easily verified, it holds 
for every p, and t is linear, i e., of the form az + /5, in the closed 
interval [x, y] An interval in which t is hnear will here be called an 
interval of linearity. If and only if there are no intervals of hnearity 
other than the one-point and vacuous intervals, t is strictly convex. 


Exercises 

1. Verify, at least graphically, that the following functions are con- 
vex in the indicated intervals, discuss their intervals of linearity; and 
say which are strictly convex 


I = (-00, +oo): 
(a) for every p, 
(c) la;l, 

(e) a:. 

7= (0,oo). 

(f) -logo:, 

7 = (—1, +1): 
(h) (1 - 


(b) + px + a for every p and a, 

(d) 1 a; for p > 1, 

(g) for —00 < p < 0. 

(i) 1 — cos (Trx/2). 


2. In an interval where t is convex, if dH{x)/dx^ exists at x^ then 
dH{x)Jdx^ > 0, and if, for every x m an interval /, dH{x)ldx^ exists and 
is non-negative, then t is convex in J. 

3 Re-explore Exercise 1 in the light of 2. 

4. Let T be a non-vacuous set of functions, t, t', • • •, convex in J, 
and let 

(2) == supi(5). 


In (2), as always in mathematics, the sup, or supremum, of a set of 
numbers is the least number, possibly «>, that is not less than any ele- 
ment of the set. If t*{s) < for every s e 7, then t* is convex m 7. 
Explore the proposition just stated, first graphically, especially for a 
finite set of linear t’s, and then analytically What if the elements of 

T are all strictly convex? _ 

5. In an open interval where t is convex, it is also contmuous What 

are the facts for closed and half-closed intervals? 
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6 If t is convex in 7, zl, Ph> 0, and Sp^ = 1, where = 1, • • - , 
r; then 

(3) Pk^i^k) — M P/c^/c j * 

Equality obtains, if and only if all the XkS are m a single interval of 
linearity of t. 

(a) Interpret the propositions above in terms of probability. 

(b) Prove them by arithmetic induction on r. 

^(c) What if t is strictly convex"? 

Exercise 6 suggests, and indeed proves- a special case of, the following 
well-known and most useful theorem, which cannot be proved here in 
full generahty. 

Theorem 1 If t is convex and bounded in the interval 7, and a:(s) e 7 
for all s zSj then 

(4) E{tix)) > t{E(x)) 

Equality obtains, if and only if the values of x are with probability one 
contained in a single interval of linearity of t. Here and throughout this 
appendix, such conditions for equality are to be understood to apply 
only in the event that either P is countably additive or the random 
variable is with probability one confined to a finite set of values, the 
general situation for finitely additive measures is a little more compli- 
cated. 

More exercises 

7. The variance of x, often written F(x), is defined thus: 

(5) 7(x) = E([x - E(x)]^) 

Show that 

(6) 7(x) = E(x2) - E^ix) > 0, 

with equality if and only if P(x(s) = E(x)) = 1. 

8. Show that, if x is never smaller than some positive number, 

(7) log < ^7(log x) < log E(x). 

When can either equality obtain? Write the analogue of (7) suggested 
by (3), and show thereby that (7) is a generalization of the familiar 
fact that the arithmetic mean (of positive numbers) is at least as great 
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as the geometric mean and the geometnc mean is at least as great as 
the harmonic mean 

One of the most famous of all inequalities is the Schwartz inequality, 
which can, though not quite obviously, be derived from Theorem 1, 
and which can be stated in terms of expected values thus 

( 8 ) < E{x^)E{y% 

with equality obtaining if and only if for some numbers p and <t not 
both zero 

(9) P(px(s) = cry(s)) = 1. 

Note that (9) expresses (perhaps too compactly) that, except on some 
set of probability zero, either x or y vamshes identically or else each 
a fixed multiple of the other. 

Statistically speaking, the Schwartz mequality expresses, in effect, 
the familiar fact that any correlation coefficient must he between +1 
and —I, one of the extremes occurring if and only if at least one of the 
two random variables involved is a Imear function of the other. 

The concept of convex functions and its implications can easily be 
extended to real-valued functions dejBned on vectors in an n-dimensional 
vector space, the role of intervals there being replaced by convex sub- 
sets of the vector space, but an understandmg of this extension, though 
desirable, is not absolutely essential m leading this book. 

One good introduction to convex subsets of vector spaces is Sections 
16 1-2 of [V4], and another especially adapted to statistical apphca- 
tions IS incorporated in [B18] The standard treatise on the topic is 
that of Bonnessen and Fenchel [B20] 
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The bibliography of about 170 items *that terminates this appendix 
lists not only all works referred to in this book but also some others, 
fbr it is intended to serve not only as a mechanical aid to reference^ but 
also as a briefly and informally annotated list of suggested readings in 
the foundations of statistics In addition to the notes incorporated 
into the bibliography, information about many of the works listed there 
IS given m other parts of the book, where it can be found by referring 
to the author’s name in the author index. 

Some readers may be interested in referring to larger or more special- 
ized bibliographies than the one given here The next few paragraphs 
are for their guidance 

Todhunter has abundant references scattered in chronological order 
through [T3], emphasizing the mathematical aspects of probability up 
through the period of Laplace Keynes, in [K4], gives a formal bibli- 
ography which purposely does not overlap Todhunter’s material very 
extensively, the emphasis being on more philosophical aspects of prob- 
ability and on the period between Laplace and Keynes Carnap in 
[Cl] also gives a formal bibliography, which emphasizes publications 
since Keynes. Carnap promises an even fuller bibliography in the 
projected second volume of his work, and he recommends the bibliog- 
raphy of Georg Henrik von Wright in [V5] 

Bibliographies of statistics proper are of some, though diluted, rele- 
vance Of these, the most useful is that of M G Kendall in Vol II 
of [K2] Carnap at the beginning of his bibliography gives reference to 
some other statistical bibliographies The enormous work of 0 K Bu- 
ros in statistical bibliography, [B23], [B24], and [B25], should also be 
mentioned His volumes bring together pointed excerpts from reviews 
of statistical books Euros also directed a bibliographic department, 
entitled “Statistical Methodology,” in the Journal of the American Sta- 
tistical Association from September 1945 to September 1948, listing cur- 
rent articles, books, theses, and chapters dealing with statistics In 
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Volume 20 (1949) of the Annals of Mathematical StaUsticSy an important 
journal of statistical theory, there are two cumulative indexes of Vol- 
umes 1-20, one arranged by author, the other by subject. 
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Communication, 68 
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Concave function, 94 
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Conditional probability, wide sense, 52 
Confidence interval, 261 
Confidence level, 261 
of a toleiance interval, 262 
Consequence, 13, 14 
generic symbols foi, 14 
variety of, 14 

Consequences, ignorance of, 15 
symbol for set of, 14 
Consideration, cost of, 30 
Consistent sequence of estimates, 226 
Constant acts, 25 
Contp^imng events, 11 
Contraction, 128ff 
^of an observation, 112 
o^f a set of acts, 113 
Convex function, 94, 266fi 
stuctly, 267 

Convex set of gambles, 75 

Convex sets, 269 

Correct act, 164 

Conect estimate, 230 

Cost of consideration, 30 

Cost of observation, 116, 118, 214, 215 

Countable additivity, 40, 43, 78 

Cranial -Rao inequality, 238 

Decision, 13 
after observation, 23 
logic and, 6 

Decision problem, gioup, 172£f 
and obseivation, 210 
objectivistic, 172fi 
Decisions, consecutive, 15, 16 
Defimtive obseivation, 127, 133, 212 
Degree of conviction, 30 
Democracy, 175 
De Morgan's theoiem, 13 
geneial, 13 
Deiived act, 106 
defimtion of. 111 
Derived decision problem, 106 
Derived pioblem, 209 
Design of experiments, 16, 105, 116 
Dichotomy, 121 
Differential information, 236ff 
Disagreement between people, 67, 68 
Dominance, 115 
in theoiy of games, 197 
of one test by another, 148 


Duahstic views on probability, 2, 51, 62, 
63 

Duality punciple, 185 
of Boolean algebra, 12 
of personal piobabihty, 78 
of theory of games, 185, 186 

Efficient sequence of estimates, 227, 
242ff 

Empiiical interpretation of postulates, 
19, 20 

Epsilon, Poison, 11 
vertical, 11 
Equal events, 11 
Equity, 63, 92 

Equivalence, of sets of acts, 113 
of tests, 148 
Equivalent acts, 19 
Equivalent observations, 112 
Equivalent sequence of events, 52 
Erroi, mean square, 224 
see also Root-mean square enor and 
Squaied error 

of an estimate, defimtion of, 227 
Eiiors of fiist and second kind, 140, 
247 

Estimation, interval, 259 
point, 220ff 
defimtion of, 221 

Estimation decision problem, 229ff 
Event, complement of, 11 
defimtion of, 10 
examples of, 10 
generic symbols foi, 11 
null (or virtually impossible), 24 
umversal, 10 
vacuous, 10 

Events, almost equivalent, 37 
contaimng, 11 
equal, 11 

intersection of, 11 
union of, 11 

Expectation, conditional, 264 
Expected value, 263ff 
defimtion of, 263 
Expezience, 44, 46, 55, 62 
Experiment and observation, 117, 118 
Extension, of an observation, 112 
of a set of acts, 113 
Extreme 129 
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Factorability criterion for sufficiency, 
130ff 
Fair com, 33 
Fiducial probability, 262 
Fine, 37, 40 

Foundations of sciences, role of, 1 
Foundations of statistics, deep, 5 
history of, Iff 
shallow, 5 

Gamble, 70, 71 
Gambhng, 63, 64, 91, 94 
Gambling apparatus, 66 
Game, abstract, 184ff 
bilinear, 186ff 
standard, 178ff 
two-person, 178ff 

Games, in relation to immmax theones 
of decision, 180ff 
mathematics of, 184ff 
theory of, 156, 178ff 
Given, 22, 44 
Grand world, 84 
Greek fonts, 11 
Group, mathematical, 193 
Group action, 105 
Group decision problem, 172ff 
and observation, 210 
Group mimmax rule, 207 

Hausdorff moment problem, 53, 55, 152 
Homogeneous coordinates, 136 
Hyper-utihty, 75 
Hypothesis, alternative, 247 
extreme null, 254 
null, 247 

Income, 163 
negative, 164, 169, 170 
and loss, 182, 200 
personal, 173 
Inconsistency, 20, 21, 57 
Indecision, 21 

Independence in quahtative probabihty, 
44, 91 

Independent events, 44 
Independent random variables, 46 
Indifference, 17, 59 
difficulty of testing, 17 
Inductive behavior, 159 


Inductive inference, 2 
Inexact science, 59 
Infimum, 80 

I nfini te sets in apphed mathematics, 39, 
77 

Infimte utility, 81 
Information, 50, 153, 235ff 
differential, 236ff 
Information inequahty, 238 
Insufficient reason, principle of, 64, 65, 
193 

Integral, 263 

Interrogation, behavioral, 28 
intermediate mode of, 28 
strictly empirical, 28, 29 
Intersection of events, 11 
Interval, 266 
Interval estimation, 257 
defimtion of, 259, 260 
Interval of gambles, 75 
Interval of Imeanty, 267 
Invariance of a game, 194ff 
Invariant mimmax, 197, 198 
Irrelevant, 126 
utterly, 126 
Irrelevant event, 44 
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tion, 270 
Judgment, 156 

Large numbers, strong law of, 54 
weak law of, 49, 54, 91 
Learmng, 44, 55 
see also Experience 
Lebesgue measure, 41 
Likelihood ratio, 48, 135ff, 225 
Likehhood-ratio test, 139, 213 
Linear function, 267 
Logic, 3 
decision and, 6 

empirical interpretation of, 20 
criticism of, 20 
mcompleteness of, 59 
normative interpretation of, 20 
Logical behavior, imphcations of, 7, 8, 20 
‘Look before you leap principle,” 16 
criticism of, 16, 17 
Loss, 163, 164, 169, 170 
personal, 174 
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Loss, uniformity of, 166, 174 
Loss and negative income, 182, 200 

Margmal utility, 103, 104 
dinoimshing, 94ff 

Mathematical expectation, pnnciple of, 
91, 92 
Maximm, 184 

Ma\imum4ikehhood estimate, 140, 203, 
222ff, 241 
defimtion of, 225 
Mean-square error, 224 

also Root mean-square error and 
Squared error 

Measurable random vaiiable, 45 
Median, 228 
Microcosm, 86 
Mimmax, 184 
Mmimax act, 164 
Mimmax equality, 179, 187 
Mimmax estimate, 232, 240, 241 
Mimmax rule, 157, 180ff 
and simple ordeiing, 205 
group, 174ff, 207 
objectmstic, 164ff 
defimtion of, 164 
illustrations of, 164ff 
objectivistic motivation of, 168, 169 
Mimmax rules, criticism of, 200ff 
Mmimax test, 249, 250 
Mimmax theones, mathematics of, 184ff 
Mimmax theory, 156 
objectmstic, defimtion of, 165 
objectmstic approach to, 158ff 
Mimmax theory and observation, 208 
Mimmax value, 164 
Mixed act, 162, 163 
in group decision problem, 173 
Mixed acts in statistics, 213, 216, 217ff 
Mixture of gambles, 71 
Moment problem, Hausdorff, 53, 55, 
152 

Moral expectation, 93, 94 
Moral worth, 93ff 

Multipersonal considerations, 122, 124, 
126, 127, 148, 154fi, 172ff 
see also Agreement, Certainty, and 
Disagreement 

Multiple observation (or statistic). 111 
counting of, 133 


Necessary statistic, 137, 224 
Necessary views of piobabihty, 3, 60, 61, 
67 

Negative income, 164, 169, 170 
and loss, 182, 200 
Neyman-Peaison school, 140 
Neyman-Peaison theoiy of testing, 252 
non-Archimedean piobabihty, 39 
Normal distribution, 132, 222 
Normative interpietation, of postulates, 
19ff 

of theory of utihty, 97 
Normative theory, 102 
Nmsance parameter, 223 
Null event, 24, 26 
Null hypothesis, 247 
extreme, 254 
Null observation, 112 

Obiectivistic decision problem, 159 
Objectivistic observational problem, 208 
Objectivistic views of piobabihty, 3, 60, 
61, 67, 253, 254 
central difficulty of, 4 
probabihty of isolated piopositions 
under, 4 

Observation, 105ff, 12511 
cost of, 116, 118, 169, 214, 215 
decision after, 23 
defimtion of, 110 

Observational problem, objectivistic, 208 
Observation and experiment, 117, 118 
Observed value, 110 
Obtains, 10 

Operating characteristic, 248 
Optiimsm, 68 
Order statistic, 132 

Parameter, 221 
nmsance, 223 
Partial ordering, 21 
Partition, 24 
almost umform, 34 
Partition formula, 45 
Partition problems, 120ff 
Personahstic view, 56 
difficulties with, 57 
possible incompleteness of, 59 
Personahstic views of probabihty, 3, 67 
Personal probabihty, 27, 30 
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Personal probability, criticism of verbal- 
ists approach to, 27, 28 
other terms for, 30 
Person as economic nmt, 8 
Pessimism, 68 
Plan as a single decision, 16 
criticism of, 16, 17 
Point estimation, 220ff 
defimtion of, 221 
Poisson distribution, 222 
Power function, 248 
Preference, 17 
as simple ordering, 18 
as partial ordering, 21 
conditional, 22 ^ 

superfluous for consequences, 25, 26 
irreflexivity of, 17 
transitivity of, 18 
Preference among consequences, 25 
distmgmshed from preference among 
acts, 25 
Pre-statistics, 5 
Primary act, 163 
Piize, 31 

Probabihties of highei order, 58 
Probabihty, mathematical properties of, 
2,3 

unknown, superfluousness of in person- 
ahstic theory, 50, 51 
views on, duahstic, 2, 51, 62, 63 
necessary, 3, 60, 61, 67 
objectmstic, 3, 60, 61, 67, 253, 254 
personahstic, 3, 67 
see also Peisonahstic view 
Probabihty measure, 33 
Probabihty space, 45 
Propositions, probabihty of, under ob- 
jectivistic views, 4, 27, 61, 62 
Pseudo-microcosm, 86 
Psychological probability, 30 

Quahtative probabihty, defimtion of, 32 
example, 28 
fine but not tight, 41 
neither fine nor tight, 41 
tight but not fine, 41 
Quantitative probabihty, 33 

Randomization, 66, 163, 216, 217 
Random numbers, 67 


Random variable, 45 
real, 263 

Rational behavior, 7 
Ray, 135 
Regret, 163 
Rejecting, 247 
Root-mean-square error, 257 
see also Mean-square error and Squared 
error 

St Petersburg paradox, 93ff 
Schwartz mequahty, 269 
Science, almost exact, 101 
Sequential analysis, 116, 142ff, 215, 216 
Sequential observational program, l42 
Sequential probability ratio procedure, 
146 

Sigmficance level, 252 
reportmg of, 256 
Sigmficance tests, 246fi 
Simple dichotomy, 138, 145, 146, 148, 
212, 213, 252 
Simple ordering, 18 
and the minimax rule, 205 
exercises on, 19 
Size of a test, 250 
Small world, 9, 16, 82fl 
Squared error, 81, 234 
see also Mean-square error and Root 
mean-square error 
Standard deviation, 257 
Standard game, 178ff 
Standard sequence of observations, 227 
State, 9 
true, 9 

States, generic symbols for, 11 
Statistic, 128 

Statistics, other names for, 2 
scope of, 2 

Statistics proper, 5, 105, 114, 121 
defimtion of, 154 
Strategy function, 111 
Strictly convex function, 267 
Subjective probabihty, 30 
Sufficient statistic, 129fl, 212, 224, 230, 
237, 246, 256, 259 
factorabihty cnterion for, 130ff 
Supremum, 80, 267 

Sure personal probabihties, 57, 58, 66 
Sure-thmg principle, 21ff, 114, 207 
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Symmetric dual, 78 
Symmetiic sequence of events, 50ff 
Symmetry, 232, 246 
in probability, 63ff 
of games, 193ff 

Tastes, 155 
Team mate, 132 
Test, defimtion of, 247 
of hypotheses, 246ff 
Testing, 221 
Testing problem, 247 
Ties An rank, 219 
Tight, 37, 40 

Time in theory of decision, 10, 17, 23, 
. 44 

Tolerance interval, 262 
Tolerance level, 262 

Topological assumptions possible for a 
simple 01 dering, 18 
Transitivity, 19 
True state, 9 

Unbijppd estimate, 203, 224, 244, 245 
defimtion of, 226 
Unbiased test, 249 
criticism of, 250 
Uniform distribution, 131 
Umon of events, 11 
Umversal event, 10 
symbol for, 11 
Utile, 82 


Utility, 69 

and the minimax rules, 20 Iff 
bounded, 95 
ciiticism of, 9 Iff 
defimtion of, 73 
histoiy of, 9 Iff 
logaiithmic, 94, 95 
probability-less, 91, 95, 96 
Utterly iiielevant observation, 126, 212, 
237 

Vacillation, 21 
Vacuous event, 10 
symbol for, 10, 11 
Vagueness, 59, 168, 169 
Value of observation, 151 
Variance, 268 
Venn diagiam, 12 

Veibalistic and behavioralistic outlooks, 
17 

Verbalistic outlook, 159ff, 220, 260, 261 
inadequacy of in definition of personal 
probability, 27, 28 
Virtual extension, 148 
Virtually equivalent acts, 148 
Virtually impossible event, 24 

World, choice of, 9 
defimtion of, 9 
examples of, 8 
grand, 84 
small, 9, 16, 82ff 



